rust-lang / crates.io

The Rust package registry
https://crates.io
Apache License 2.0
2.96k stars 600 forks source link

Tracking issue for switching to hyper #1502

Closed jtgeibel closed 5 years ago

jtgeibel commented 6 years ago

Background

In #1378 we attempted to switch our web server from civet to hyper. Once this was deployed, it was discovered that cargo API requests were in the form of https://crates.io//api/v1/.... Civet has some interesting logic that consolidates adjacent / characters and removes all .. sequences that follow a /. This code doesn't really do the equivalent of directory traversal, for example I believe /../ is converted to /. The code definitely doesn't pop the last segment off of the path (but does avoid any allocations).

Hyper does not do any such manipulation of the URL path, so all cargo requests were resulting in a 404 response because the router logic doesn't know how to handle the extra / at the beginning of the path. We rolled back the deploy and then reverted the code in #1476.

There is 1 other interesting portion of code that I want to point out. In conduit-static there is logic that results in a 404 if the path contains ... This should protect against any directory traversal attacks, but isn't exactly very precise.

Updating the index configuration

Recently I noticed that the config.json file in the registry index included a trailing / in the API endpoint definition. Since all versions of cargo append a /api/v1 to this value, the trailing slash was removed in rust-lang/crates.io-index#12. I'm not sure exactly under what conditions cargo updates the registry, but as clients sync with the registry they should begin sending API requests without the extra slash. It is possible that there are some commands (like search and yank) that don't update the index first, so some old clients may still issue requests with the extra slash. I think we could search the logs to see if there are any such requests remaining and determine if we need to add logic to the server to handle this.

Where should this be handled?

When used as a reverse proxy, nginx doesn't modify the URL path to merge slashes or handle .. directory traversal. If it did, then the extra slashes would not have hit our server code in production. When serving files directly, nginx does handle extra slashes and potential directory traversal attacks. I have not yet tested to see how application servers (such as passenger with nginx) handle this.

Since hyper may also be used by client code to create a proxy I'm not sure that it would be the proper place to address this. I believe this is probably something that should be handled at the framework level, but for now I would like to migrate to hyper without choosing a particular framework. So if we need to address this on the backend, it looks like we'll need to add some middleware. Here are my current thoughts on that:

Summary

In summary, I'd like to resurrect the PR to switch to hyper. I think we have 3 choices for a new middleware layer:

Right now I'm leaning toward option 1, unless we have some cargo clients that haven't synced with the registry, then option 2. We can go with option 3 if we want to be more cautious and future-proof. Of course, I'm also open to any other suggestions here as well.

ishitatsuyuki commented 6 years ago

Could we go further and switch the whole stack to Actix? Conduit is not actively developed which means that few people are familiar with it. Actix, on the other hand, is the most popular currently.

jtgeibel commented 6 years ago

So I have looked into Actix, a bit. For the moment, I want to migrate away from a C based HTTP server to a Rust based implementation, without committing to a framework.

A few of the things I thought about while implementing the original PR:

I published conduit-hyper (used in #1378) to temporarily fill the gap and transition away from our, currently unmaintained, C-based web server.

jtgeibel commented 6 years ago

Status update: Last week @carols10cents and I looked at the logs to see if there are any cargo instances still hitting routes with //. We didn't see any, but then realized that the currently deployed web server is normalizing these for us before they hit the application (and our logging). So this doesn't really show us anything and has me leaning towards option 2, where I add special handing for the // prefix but don't implement full directory traversal logic.

sgrif commented 6 years ago

I would really prefer that we do full URI normalization somewhere rather than just special casing //. Handling things like .. are laid out by the spec, and aren't something that should ever make it to our router, regardless of what clients are doing

jtgeibel commented 5 years ago

The URL normalization was incorporated into the conduit-hyper crate and hyper can now be enabled in production by setting USE_HYPER=1 in the environment. Closing this issue.