Open hawkw opened 1 year ago
Would it make sense in the display to limit the amount of viewable bytes? That said, I think this makes sense to me.
Would it make sense in the display to limit the amount of viewable bytes? That said, I think this makes sense to me.
Yeah, we probably want to have some kind of "common sense limit" on how much of the URI we print...
On the one hand, better errors are a great user-centric improvement, I love em! If there's a way to improve it for users, even just some of the time, that'd be wonderful.
On the other hand, here's some concerns we'd need to overcome:
Bytes
, which is cheap to clone, or even just own, similar to the TryFrom<String>
. But those taking a slice, there'd be a cost to copying into a new vec to store in the error.Bytes
, would storing a clone of it in an error message make the original buffer possibly live way longer than one assumed (considering the Bytes
could be a small slice of a bigger one that took a big read from the socket)?
Bytes
method that forced a copy if the diff were big enough?
- Some of the parsing methods take
Bytes
, which is cheap to clone, or even just own, similar to theTryFrom<String>
. But those taking a slice, there'd be a cost to copying into a new vec to store in the error.
Hmm, don't the parsing methods that take a slice convert the slice into a Bytes
before trying to parse it, anyway? E.g. https://github.com/hyperium/http/blob/34a9d6bdab027948d6dea3b36d994f9cbaf96f75/src/uri/mod.rs#L713-L715
If I understand the code correctly, it looks like all paths that construct a Uri
from a &[u8]
or an &str
will already copy the bytes into a Bytes
before trying to parse, and then the Bytes
is dropped if the URI is invalid? Or am I misunderstanding something here?
The parse data usually comes from an outsider, would printing that data as part of the error message have any security considerations? I imagine the answer is a tentative yes, as in "people could screw up and print it in a way that an attacker can do bad things", so it wouldn't be our fault, but it could be making it easier for some to be pwned.
Likewise from a privacy point of view: would the data possibly have PII that people logging these errors weren't expecting?
This is a valid concern --- perhaps we would want to add a method to the InvalidUri
error to access the input bytes, rather than always including it in the Debug
implementation? That way, the application can choose whether or not to actually log the URI value. The downside is that this would probably mean that, for hyper
users, hyper
would need to actually expose the http::uri::InvalidUri
as a source and/or downcast target for its error types, or else there would be no way for the application to explicitly choose to display the invalid input...
To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper
server. In that case, there isn't currently a good way for the application to choose whether or not it wants to display the invalid input, because the buffer that the bytes were read into is owned by hyper
, and it's hyper
's HTTP parser that's calling into http::Uri
's parsing functions, rather than the application itself. In that case, the application can't easily access the portion of the buffer that contained the invalid URI, without doing something really unpleasant like wrapping the underlying IO resource to buffer all the bytes, and essentially reimplementing most of the HTTP parser just to find potentially invalid URIs.
For applications where the primary interaction with URI parsing is passing a user input that might be a URI into hyper
, it would be much easier for the application to just hang onto the input and choose to log it if hyper
returns an invalid URI error. So, I'm mainly concerned about the server use case here.
don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway?
Uh, well, woops. That could be improved...
To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server.
Making better error messages is already splendid motivation, I didn't mean to seem against the idea. I think we should always help people understand what went wrong. Just wanted to make sure we didn't miss anything while doing so.
don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway?
Uh, well, woops. That could be improved...
Hmm, if we're planning to change those functions to only copy the bytes into a Bytes
if the URI is valid, I suppose we could make the input component of the error optional, and only include it when the URI was constructed from a Bytes
or an already-owned String
etc? But I think that's probably only a good idea if we're going to include the input in the Display
output, rather than allowing explicit access to it, since it seems a bit weird for the error to sometimes contain the invalid input and sometimes not contain it, depending on what type the URI was parsed from...
To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server.
Making better error messages is already splendid motivation, I didn't mean to seem against the idea. I think we should always help people understand what went wrong. Just wanted to make sure we didn't miss anything while doing so.
We're in agreement on that --- you didn't come off as against it at all! I also want to consider all the possible reasons we shouldn't do this. I mainly thought it was worth explaining the motivation in order to see whether there might be other ways to surface invalid inputs in that don't have the same drawbacks as always including it in the error (e.g. some kind of new API in Hyper?)...
Quick ping on this, I'd be happy to open a PR to add it if we can come to consensus on the constraints!
The
InvalidUri
error type is returned when trying to parse an input as a URI, if the bytes are not a valid URI. This error type does not contain the value that we were trying to parse, but does indicate how the input was invalid.Since all methods of parsing a value as a
Uri
either take aBytes
or convert the input (string or byte slice) into aBytes
, the error type probably could include the input, and format it as part of theirfmt::Display
/fmt::Debug
implementations.Uri::from_shared
takes aBytes
by value, so returning it in the error value shouldn't have a significant performance cost --- if we're trying to parse a URI from an&str
or&[u8]
, we will have already copied it in order to convert it into aBytes
, and if the input was itself aBytes
, the reference count has already been incremented in order to callUri::from_shared
.This would make it much easier for an application to display the invalid input to the user, which is desirable, especially in cases where the invalid URI came from user input.