Open sbosnick opened 4 years ago
Thanks for what you're proposing here, it's really valuable. I agree that the article is mostly FUD, and just ignored it, but there's never a problem with making code even clearer to understand. :heart:
I think the usage of mem::uninitialized()
in HeaderName
may cause problems in the future as rustc gains more aggressive optimisations. (It is deprecated as it is impossible to use without causing UB).
Does http
have a particularly low minimum supported version of rustc that would preclude the use of MaybeUninit
here? The lowest version tested in CI is 1.39, which does support MaybeUninit, although perhaps this optimisation is not needed anyway.
When http
upgraded the version of bytes
it uses to 0.5 this also necessitated an increase in the minimum supported version of rustc to 1.39.0 because that is the minimum supported version for bytes
version 0.5. See 43dffa1eb79f6801e5e07f3338fa56191dc454bb and the bytes
CHANGELOG.
It appears that mem::uninitialized()
can be converted to MaybeUninit
. I can look at that when I am reviewing header/name.rs.
I am part way through the review of the use of unsafe
(in #408, #410, #413, #414, #416, and #417) so I wanted to provide an update on this issue to give a high-level summary of my progress. So far I have reviewed method.rs, bytes_str.rs, and the four files in the uri modules (my original post said there were three but I had missed one).
In the 6 files reviewed so far the use of unsafe
is all related directly or indirectly to calls to std::str::from_utf8_unchecked()
. As documented in that function, its safe use requires that the caller ensure that the passed in &[u8]
is valid UTF-8. Code that didn't satisfy this precondition would cause undefined behaviour because it would produce a &str
that isn't valid UTF-8 (see the list of undefined behaviour here).
In the files I have reviews far, there are two structs that contain an internal field that is AsRef<[u8]>
: Method (from method.rs) and the internal ByteStr
(from byte_str.rs). These structs can expose the internal field as a &str
though either an as_str()
method or with an implementation of std::ops::Deref
. In both cases these structs maintain an invariant that the internal field is valid UTF-8. #408 and #410 propose changes that make it clear how these structs maintain the invariant.
The internal ByteStr
struct implements an unsafe function called from_utf8_unchecked()
which is modelled on the function by the same name from std::str
. All of the use of unsafe
in the four files in the uri module relate to calls to this function. The safe use of ByteStr::from_utf8_unchecked()
requires that the passed in parameter represent valid UTF-8. In the four files in the uri module this is ensured through the parsing code which checks that each byte is a valid byte for a Uri. While the list of valid bytes is different for different parts of a Uri, what they all have in common (in this crate's implementation of the parsing) is that they are all valid single byte UTF-8. #413, #414, #416, and #417 all propose changes that make explicit the preconditions and postconditions that ensure the parameter passed to ByteStr::from_utf8_unchecked()
represents valid UTF-8.
In my original issue description I had indicated that I was investigating one possible exception to my initial conclusion that the use of unsafe
in this crate was all sound. In my initial review it wasn't immediately clear that all of the calls to ByteStr::from_utf8_unchecked()
in the uri module avoided passing in a representation that was not valid UTF-8. I am now satisfied that they all do. My hope is that the changes proposed in the pull requests listed above will make this more obvious to the next person looking at the use of unsafe
in http
.
I will be continuing this review with the three files in the header module as time allows.
In #419 there is one use of unsafe
that might, or might not, make it appropriate to change the pubic API of http
. None of the earlier pull requests involve proposed changes to the public API and I haven't added such a change to #419 without discussing it here first.
The specific change that I am considering in #419 concerns the unsafe
function from_maybe_shared_unchecked()
and whether that function needs to be unsafe
. The conversation in #419 has a longer explanation of this specific issue.
What I wanted to do in this comment was to invite some feedback on the general issue of unsafe
functions and methods in the public API of this crate.
Marking a function or method unsafe
is a (compiler-enforced) signal to the caller that it is responsible for something, but the issue I am seeking feedback on is what those responsibilities should be for the purposes of the public API of this create. (unsafe
functions and methods also make the body an unsafe block but this may change. See rust-lang/rust#71668). I can think of 2 possibilities for the caller's responsibility:
My own view is that unsafe
functions and methods should be limited to the first possibility. Responsibility for adhering to API restrictions other than avoiding undefined behaviour should be signalled without using an unsafe
function or method (i.e. with documentation and the use of the suffix _unchecked
in the name).
It would assist me in finishing #419, though, to have some feedback on this general issue.
A recent Medium article criticized 3 of the 7 rust http clients it reviewed because of their dependency on this crate and because of this crate's use of
unsafe
. The criticism does not appear to be founded (i.e. it claims thatHeaderMap
is a reimplementation ofHashMap
without explaining how to useHashMap
as a multimap) and the author's tests of these 3 http clients did not show any exploitable problems related to this crate. The article appears to an example of withoutboats' post about allegations of vulnerabilities in Rust libraries.Despite the limitations of the Medium article,
http
is (and is designed to be) a high-level dependency of many network exposed crates so the sound use ofunsafe
is important. The use ofunsafe
has already been audited (see for example rust-secure-code/safety-dance#37). In my preliminary review of the use ofunsafe
I found that they were sound (with one possible exception I am still investigating). The soundness, though, relies on implicit invariants that, in some instance, are not local to the use ofunsafe
.I propose to try to increase the robustness (in the sense described here) of the use of
unsafe
inhttp
by:unsafe
code pathsunsafe
into smaller leaf modules that present a safe wrapper around the unsafe code (without introducing performance regressions)unsafe
sound (making explicit what is implicit)HeaderMap
(such as multimap) so that the performance trade-offs of the current implantation are more easily apparentThe goal isn't to reduce or eliminate the use of
unsafe
(though a reduction in in the number of uses ofunsafe
may be a by-product). It is rather to make the soundness of the use ofunsafe
easier for someone else to audit.I plan to review the use of
unsafe
file by file as follows (the number in parenthesis is the number of occurrences ofunsafe
in the file):I have already completed the work for the first two files in #408 and #410. I am planning to continue with the three uri/ files and then the header/ files as time allows. If the maintainers of the project don't think this is worthwhile please let me know so I can move on to something else.