Document and test the use of unsafe

A recent Medium article criticized 3 of the 7 rust http clients it reviewed because of their dependency on this crate and because of this crate's use of unsafe. The criticism does not appear to be founded (i.e. it claims that HeaderMap is a reimplementation of HashMap without explaining how to use HashMap as a multimap) and the author's tests of these 3 http clients did not show any exploitable problems related to this crate. The article appears to an example of withoutboats' post about allegations of vulnerabilities in Rust libraries.

Despite the limitations of the Medium article, http is (and is designed to be) a high-level dependency of many network exposed crates so the sound use of unsafe is important. The use of unsafe has already been audited (see for example rust-secure-code/safety-dance#37). In my preliminary review of the use of unsafe I found that they were sound (with one possible exception I am still investigating). The soundness, though, relies on implicit invariants that, in some instance, are not local to the use of unsafe.

I propose to try to increase the robustness (in the sense described here) of the use of unsafe in http by:

Adding unit tests for edge condition involving unsafe code paths
Refactoring the code involving unsafe into smaller leaf modules that present a safe wrapper around the unsafe code (without introducing performance regressions)
Documenting the invariants that make the use of unsafe sound (making explicit what is implicit)
Adding additional performance tests for uses-no-unsafe alternatives to HeaderMap (such as multimap) so that the performance trade-offs of the current implantation are more easily apparent

The goal isn't to reduce or eliminate the use of unsafe (though a reduction in in the number of uses of unsafe may be a by-product). It is rather to make the soundness of the use of unsafe easier for someone else to audit.

I plan to review the use of unsafe file by file as follows (the number in parenthesis is the number of occurrences of unsafe in the file):

[x] byte_str.rs (2)
[x] method.rs (2)
[x] uri/path.rs (1)
[x] uri/authority.rs (3)
[x] uri/scheme.rs (1)
[x] uri/mod.rs (3)
[x] header/value.rs (4)
[x] header/name.rs (11)
[ ] header/map.rs (19)

I have already completed the work for the first two files in #408 and #410. I am planning to continue with the three uri/ files and then the header/ files as time allows. If the maintainers of the project don't think this is worthwhile please let me know so I can move on to something else.

Thanks for what you're proposing here, it's really valuable. I agree that the article is mostly FUD, and just ignored it, but there's never a problem with making code even clearer to understand. :heart:

I think the usage of mem::uninitialized() in HeaderName may cause problems in the future as rustc gains more aggressive optimisations. (It is deprecated as it is impossible to use without causing UB).

Does http have a particularly low minimum supported version of rustc that would preclude the use of MaybeUninit here? The lowest version tested in CI is 1.39, which does support MaybeUninit, although perhaps this optimisation is not needed anyway.

When http upgraded the version of bytes it uses to 0.5 this also necessitated an increase in the minimum supported version of rustc to 1.39.0 because that is the minimum supported version for bytes version 0.5. See 43dffa1eb79f6801e5e07f3338fa56191dc454bb and the bytes CHANGELOG.

It appears that mem::uninitialized() can be converted to MaybeUninit. I can look at that when I am reviewing header/name.rs.

I am part way through the review of the use of unsafe (in #408, #410, #413, #414, #416, and #417) so I wanted to provide an update on this issue to give a high-level summary of my progress. So far I have reviewed method.rs, bytes_str.rs, and the four files in the uri modules (my original post said there were three but I had missed one).

In the 6 files reviewed so far the use of unsafe is all related directly or indirectly to calls to std::str::from_utf8_unchecked(). As documented in that function, its safe use requires that the caller ensure that the passed in &[u8] is valid UTF-8. Code that didn't satisfy this precondition would cause undefined behaviour because it would produce a &str that isn't valid UTF-8 (see the list of undefined behaviour here).

In the files I have reviews far, there are two structs that contain an internal field that is AsRef<[u8]>: Method (from method.rs) and the internal ByteStr (from byte_str.rs). These structs can expose the internal field as a &str though either an as_str() method or with an implementation of std::ops::Deref. In both cases these structs maintain an invariant that the internal field is valid UTF-8. #408 and #410 propose changes that make it clear how these structs maintain the invariant.

The internal ByteStr struct implements an unsafe function called from_utf8_unchecked() which is modelled on the function by the same name from std::str. All of the use of unsafe in the four files in the uri module relate to calls to this function. The safe use of ByteStr::from_utf8_unchecked() requires that the passed in parameter represent valid UTF-8. In the four files in the uri module this is ensured through the parsing code which checks that each byte is a valid byte for a Uri. While the list of valid bytes is different for different parts of a Uri, what they all have in common (in this crate's implementation of the parsing) is that they are all valid single byte UTF-8. #413, #414, #416, and #417 all propose changes that make explicit the preconditions and postconditions that ensure the parameter passed to ByteStr::from_utf8_unchecked() represents valid UTF-8.

In my original issue description I had indicated that I was investigating one possible exception to my initial conclusion that the use of unsafe in this crate was all sound. In my initial review it wasn't immediately clear that all of the calls to ByteStr::from_utf8_unchecked() in the uri module avoided passing in a representation that was not valid UTF-8. I am now satisfied that they all do. My hope is that the changes proposed in the pull requests listed above will make this more obvious to the next person looking at the use of unsafe in http.

I will be continuing this review with the three files in the header module as time allows.

In #419 there is one use of unsafe that might, or might not, make it appropriate to change the pubic API of http. None of the earlier pull requests involve proposed changes to the public API and I haven't added such a change to #419 without discussing it here first.

The specific change that I am considering in #419 concerns the unsafe function from_maybe_shared_unchecked() and whether that function needs to be unsafe. The conversation in #419 has a longer explanation of this specific issue.

What I wanted to do in this comment was to invite some feedback on the general issue of unsafe functions and methods in the public API of this crate.

Marking a function or method unsafe is a (compiler-enforced) signal to the caller that it is responsible for something, but the issue I am seeking feedback on is what those responsibilities should be for the purposes of the public API of this create. (unsafe functions and methods also make the body an unsafe block but this may change. See rust-lang/rust#71668). I can think of 2 possibilities for the caller's responsibility:

The caller is responsible for ensuring that it does not cause undefined behaviour when it calls the function or method.
The caller is responsible for ensuring that it adheres to various restrictions of the API including avoiding undefined behaviour, but not limited to just that.

My own view is that unsafe functions and methods should be limited to the first possibility. Responsibility for adhering to API restrictions other than avoiding undefined behaviour should be signalled without using an unsafe function or method (i.e. with documentation and the use of the suffix _unchecked in the name).

It would assist me in finishing #419, though, to have some feedback on this general issue.

hyperium / http

Document and test the use of unsafe #412