haskell / core-libraries-committee

95 stars 16 forks source link

Reject FilePaths containing interior NULs #144

Closed bgamari closed 1 year ago

bgamari commented 1 year ago

In GHC #13660 it was pointed out that GHC's file IO operations will currently accept file paths containing interior NULs (e.g. "hello\0world") with surprising results.

POSIX explicitly prohibits file paths containing interior NULs and for this reason many languages (e.g. Python, Java, Go, recenet PHP) currently reject such paths with an "invalid argument" error. I suggest that we do the same. Specifically, we should check the encoded form of the filepath for interior NULs and throw IOError {ioe_type=InvalidArgument} when such a path is found.

Similarly, we propose that Windows rejects NUL codepoints in FilePaths.

Implementation notes

Concretely, the relevant checks would be added in System.Posix.Internals.{new,with}FilePath, which are used by all IO operations on Windows and POSIX to encode FilePaths to buffers to pass to the operating system. The operations affected are primarily those implemented in terms of GHC.IO.FD.openFileWith (e.g. readFile, writeFile, openFile, appendFile, withFile) although there are others (e.g. System.Environment.ExecutablePath.realpath).

I have posted a preliminary merge request carrying out this change.

Backporting

As this issue may result in exploitable behavior in some server applications, we suggest that the CLC should consider applying this change retroactively in GHC 9.2, 9.4, and 9.6.

treeowl commented 1 year ago

Are there any operating systems/file systems currently in use that accept such paths?

bgamari commented 1 year ago

Are there any operating systems/file systems currently in use that accept such paths?

I do not know of any.

phadej commented 1 year ago

What is specifically proposed here? A check in readFile, writeFile (i.e. openFile) and similar low-level filesystem accessing primitives?

bgamari commented 1 year ago

Are there any operating systems/file systems currently in use that accept such paths?

According to https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits, APFS and HFS+ both allow any UTF-8 encoded Unicode string. However, it is not clear to me that the operating system would actually expose such files (since the primary means of file IO on Darwin is the usual POSIX interfaces, which preclude filepaths containing NULs by virtue of using C strings).

What is specifically proposed here? A check in readFile, writeFile (i.e. openFile) and similar low-level filesystem accessing primitives?

I have now addressed this in the proposal text.

treeowl commented 1 year ago

According to the APFS reference, the name is null-terminated, even though its length is also stored.

bgamari commented 1 year ago

According to the APFS reference, the name is null-terminated, even though its length is also stored.

Yes, this is what I was saying above; however, it is important to distinguish between what the file system can represent and what the kernel can expose to user space. The fact that APFS stores the length of the file name means that NUL can technically appear in the name. However, I don't see how a file with such a name could possibly be accessed with the interfaces exposed by Darwin, even using the Mach's native interfaces.

phadej commented 1 year ago

which are used by all IO operations on Windows and POSIX to encode FilePaths to buffers to pass to the operating system.

Interestingly filepath doesn't do that when encoding to its buffers.

It's unfortunate that the functionality have to be duplicated. Can System.Posix.Internals.withFilePath return CStringLen, and exported from some non-internal / don't use me module? (EDIT: actually filepath comment says that what base does is shady...)

I'm afraid there is plenty of code which just calls withCString to encode filepaths. OTOH, maybe it's a blessing as they won't be affected by the proposed change (they are wrong, and will remain wrong).

hasufell commented 1 year ago

I'm not opposed to this proposal, but I still want to challenge a few things.


In GHC #13660 it was pointed out that GHC's file IO operations will currently accept file paths containing interior NULs (e.g. "hello\0world") with surprising results.

Yes, but it's debatable whether the results are surprising. Anyone who knows the underlying syscalls will probably expect the current behavior, because there really is no other way if you just pass things along.


Although not proposed here, it seemed to be suggested in the GHC ticket implicitly: I do not think that filepath should ban (e.g. via smart constructors) invalid filepaths. It's a feature that you can currently assemble invalid filepaths. Rust in fact allows this too (but rejects to create/open it). In filepath, there's isValid to check for it. It's a low-level library and the documentation is very clear about the guarantees: https://hackage.haskell.org/package/filepath-1.4.100.3/docs/System-OsPath.html

Apart from encoding, filepaths have additional restrictions per platform:

Use isValid to check for these restrictions (OsPath doesn't maintain this invariant).

Also note that these restrictions are not exhaustive and further filesystem specific restrictions may apply on all platforms. This library makes no attempt at satisfying these. Library users may need to account for that, depending on what filesystems they want to support.


Further: to get a +1 from me, I also expect a working patch to file-io, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io

bgamari commented 1 year ago

Yes, but it's debatable whether the results are surprising. Anyone who knows the underlying syscalls will probably expect the current behavior, because there really is no other way if you just pass things along.

Yes, perhaps this behavior could be predicted by many, but I would argue that it is desirable to no one. The reason most people use high-level languages is to provide safe, reasonably portable, and easy-to-reason-about abstractions over the sometimes-messy functionality provided by the system. In my opinion, you shouldn't need to think about C's string representation to safely use base's file I/O operations.

This is in especially true in this case, where, AFAICT, there is no compelling reason why one would want to writeFile "hello\0world" .... If you end up in this situation, something has almost certainly gone wrong (e.g. either someone is trying to exploit the program or there is a bug leading to an incorrectly formed path) . I, for one, would hope for an error in this situation.

Although not proposed here, it seemed to be suggested in the GHC ticket implicitly

I didn't mean to imply this. I honestly have not thought enough about the potential use-cases for this sort of path in filepath to offer a suggestion one way or the other. Your reasoning doesn't sound unreasonable; it's hard to imagine a down-side to allowing such filepaths to be represented, so long as they are rejected as IO targets.

Further: to get a +1 from me, I also expect a working patch to file-io, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io

I would be happy to offer such a patch but I think there is a procedural question to be resolved here. It has been my understanding that CLC proposals are intended to decide on interface design; the question of "who" or "how" a particular change is to implemented seems rather orthogonal to this. It seems a bit odd for the committee to predicate their vote on matters that seem rather outside the stated mission of the proposal process. I think we should be clear about the scope here.

hasufell commented 1 year ago

It seems a bit odd for the committee to predicate their vote on matters that seem rather outside the stated mission of the proposal process. I think we should be clear about the scope here.

I'm not saying you have to write this patch, but I want to avoid unnecessary surprises. This is part of the ecosystem impact. file-io can't be out of sync with base. And it depends on unix and Win32. I haven't checked if the implementation there is trivial.

phadej commented 1 year ago

I would be happy to offer such a patch but I think there is a procedural question to be resolved here. It has been my understanding that CLC proposals are intended to decide on interface design

CLC proposals must have a patch before being accepted. That is being clarified in https://github.com/haskell/core-libraries-committee/pull/137

bgamari commented 1 year ago

I have added a final section suggesting that the CLC consider whether this change should be backported to stable GHC releases. Specifically, it is possible that this flavour of issue fits under CAPEC-52, which has been observed in numerous CVEs in the past and has been stated to have a "high" typical severity.

Consequently, I would appreciate it if we could make progress towards a decision on this proposal.

hasufell commented 1 year ago

I have added a final section suggesting that the CLC consider whether this change should be backported to stable GHC releases.

I'd leave that up to GHC developers.

bgamari commented 1 year ago

@Bodigrim, @chshersh, @mixphix, @angerman, @parsonsmatt, @tomjaguarpaw, do you have any thoughts on this? It would be great to move this along so we can begin backporting for the coming 9.4.5 release.

mixphix commented 1 year ago

The merge request looks fine to me. If it's not supposed to happen anyway, flagging it here would only be more helpful in finding bugs elsewhere.

tomjaguarpaw commented 1 year ago

This seems sensible, but it's rather outside my area of expertise.

this issue may result in exploitable behavior in some server applications

What's exploitable about this?

chshersh commented 1 year ago

I'm okay with this change and happy with the MR. Throwing an exception instead of silently truncated the file path sounds better to me. I also appreciate consistency with other languages.

hasufell commented 1 year ago

What's exploitable about this?

Software may not show the NUL byte in a visible manner and not truncate. Users could be coerced into accepting faulty input for destructive file operations.

Similarly, a backend, where the developers don't anticipate the current behavior, may have a whitelist mechanism on directories that can be easily circumvented by a NUL-containing filepath.

Bodigrim commented 1 year ago

Further: to get a +1 from me, I also expect a working patch to file-io, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io

@hasufell what's your conclusion on this? Does file-io need a patch under this proposal?

hasufell commented 1 year ago

Further: to get a +1 from me, I also expect a working patch to file-io, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io

@hasufell what's your conclusion on this? Does file-io need a patch under this proposal?

In fact, both unix and Win32 do, so IMO, this needs to by synchronized with the base patch for the targeted GHC release. Otherwise we are inconsistent.

I've already set up a PR for unix: https://github.com/haskell/unix/pull/279

I'll create a PR for Win32 later.

file-io will then need to set up tighter bounds. Since file-io already has very high lower bounds on unix and Win32, there won't be any point in backporting this to older unix/Win32 branches (at least from file-io perspective).

hasufell commented 1 year ago

Win32: https://github.com/haskell/win32/pull/218

Bodigrim commented 1 year ago

@hasufell I take it that you are content with the proposal?

hasufell commented 1 year ago

@hasufell I take it that you are content with the proposal?

Given that we coordinate the release of all the boot libraries properly, so that all major APIs are fixed at the same time.

Bodigrim commented 1 year ago

We can certainly coordinate GHC 9.6.2 + unix-2.8.2.0 + Win32-2.13.4.0. Not sure what's the best course of actions if it gets backported to GHC 9.4.4 or earlier: one would have to patch previous major releases of unix and Win32. Not impossible, but tedious.

@frasertweedale is this something of interest for Haskell Security Response Team?

I've left some suggestions at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10110. Once @bgamari replies, we can proceed with a vote.

Bodigrim commented 1 year ago

There are might be some technical tweaks to https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10110, but overall it is ready for a vote.

Dear CLC members, let's vote on the proposal to reject FilePaths containing interior NUL chars or bytes. Not rejecting them is a security vulnerability, as explained in https://github.com/haskell/core-libraries-committee/issues/144#issuecomment-1482624286.

(To be clear: this is not a vote to approve backporting to earlier GHC releases, just the proposal per se)

@tomjaguarpaw @chshersh @mixphix @angerman @parsonsmatt @hasufell


+1 from me, seems like a serious vulnerability.

mixphix commented 1 year ago

+1

hasufell commented 1 year ago

+1

frasertweedale commented 1 year ago

@frasertweedale is this something of interest for Haskell Security Response Team?

Thank you for the heads-up. SRT should accept advisories for any concrete vulnerability in a Haskell package (library or program) due to the current behaviour. That includes user confusion/deception scenarios.

Whether the current behaviour (acceptance of FilePaths with interior NULs) warrants an advisory against base / file-io / unix is a point for the SRT (which will soon be established) to discuss. If anyone knows of prior advisories (e.g. CVE) for similar behaviour in other languages/libraries, please share.

hasufell commented 1 year ago

In my opinion this is not a CVE. It's a CWE and already exists: https://cwe.mitre.org/data/definitions/158.html

This is not exploitable without further context. Programs today can already mitigate it by using isValid on untrusted filepaths.

Moving the mitigation into base/unix/Win32 makes it harder to trigger it. But the primitives still exist, namely newCWString and still don't do any input validation.

tomjaguarpaw commented 1 year ago

+1

chshersh commented 1 year ago

+1

bgamari commented 1 year ago

I have realized that the current semantics of {new,with}CStringLen make it difficult to robustly determine whether the encoded form of a string contains internal NULs. Specifically, these functions do not add a final NUL-terminator to their output.

I have proposed that we introduce variants of these operations which do NUL terminate their output in #153.

Bodigrim commented 1 year ago

Thanks all, 5 votes out of possible 7 are enough to approve the proposal to reject FilePaths containing interior NULs.

hasufell commented 1 year ago

Are we in agreement here about the coordination of the patches and release management of unix+Win32? Who will manage this?

bgamari commented 1 year ago

I would also like to echo @hasufell's question. We are trying to release GHC 9.4 and would like guidance on which backports should be carried out.

Bodigrim commented 1 year ago

I'm afraid I'm not up to an urgent release of unix-2.7 in time for GHC 9.4.5.

Given that https://gitlab.haskell.org/ghc/ghc/-/issues/13660 was open for 5 years and nobody cared much, I would not complicate matters with backports to GHC 9.4. As @hasufell says, clients should have used isValid from filepath anyways. I'm happy to be overruled if anyone feels strongly otherwise.

hasufell commented 1 year ago

I'm fine with either way.