Closed bgamari closed 1 year ago
Are there any operating systems/file systems currently in use that accept such paths?
Are there any operating systems/file systems currently in use that accept such paths?
I do not know of any.
What is specifically proposed here? A check in readFile
, writeFile
(i.e. openFile
) and similar low-level filesystem accessing primitives?
Are there any operating systems/file systems currently in use that accept such paths?
According to https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits, APFS and HFS+ both allow any UTF-8 encoded Unicode string. However, it is not clear to me that the operating system would actually expose such files (since the primary means of file IO on Darwin is the usual POSIX interfaces, which preclude filepaths containing NULs by virtue of using C strings).
What is specifically proposed here? A check in
readFile
,writeFile
(i.e.openFile
) and similar low-level filesystem accessing primitives?
I have now addressed this in the proposal text.
According to the APFS reference, the name is null-terminated, even though its length is also stored.
According to the APFS reference, the name is null-terminated, even though its length is also stored.
Yes, this is what I was saying above; however, it is important to distinguish between what the file system can represent and what the kernel can expose to user space. The fact that APFS stores the length of the file name means that NUL can technically appear in the name. However, I don't see how a file with such a name could possibly be accessed with the interfaces exposed by Darwin, even using the Mach's native interfaces.
which are used by all IO operations on Windows and POSIX to encode FilePaths to buffers to pass to the operating system.
Interestingly filepath
doesn't do that when encoding to its buffers.
It's unfortunate that the functionality have to be duplicated. Can System.Posix.Internals.withFilePath
return CStringLen
, and exported from some non-internal / don't use me module? (EDIT: actually filepath
comment says that what base
does is shady...)
I'm afraid there is plenty of code which just calls withCString
to encode filepaths. OTOH, maybe it's a blessing as they won't be affected by the proposed change (they are wrong, and will remain wrong).
I'm not opposed to this proposal, but I still want to challenge a few things.
In GHC #13660 it was pointed out that GHC's file IO operations will currently accept file paths containing interior NULs (e.g. "hello\0world") with surprising results.
Yes, but it's debatable whether the results are surprising. Anyone who knows the underlying syscalls will probably expect the current behavior, because there really is no other way if you just pass things along.
Although not proposed here, it seemed to be suggested in the GHC ticket implicitly: I do not think that filepath
should ban (e.g. via smart constructors) invalid filepaths. It's a feature that you can currently assemble invalid filepaths. Rust in fact allows this too (but rejects to create/open it). In filepath, there's isValid
to check for it. It's a low-level library and the documentation is very clear about the guarantees: https://hackage.haskell.org/package/filepath-1.4.100.3/docs/System-OsPath.html
Apart from encoding, filepaths have additional restrictions per platform:
- On Windows the naming convention may apply
- On Unix, only NUL bytes are disallowed as per the POSIX specification
Use isValid to check for these restrictions (OsPath doesn't maintain this invariant).
Also note that these restrictions are not exhaustive and further filesystem specific restrictions may apply on all platforms. This library makes no attempt at satisfying these. Library users may need to account for that, depending on what filesystems they want to support.
Further: to get a +1 from me, I also expect a working patch to file-io
, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io
Yes, but it's debatable whether the results are surprising. Anyone who knows the underlying syscalls will probably expect the current behavior, because there really is no other way if you just pass things along.
Yes, perhaps this behavior could be predicted by many, but I would argue that it is desirable to no one. The reason most people use high-level languages is to provide safe, reasonably portable, and easy-to-reason-about abstractions over the sometimes-messy functionality provided by the system. In my opinion, you shouldn't need to think about C's string representation to safely use base
's file I/O operations.
This is in especially true in this case, where, AFAICT, there is no compelling reason why one would want to writeFile "hello\0world" ...
. If you end up in this situation, something has almost certainly gone wrong (e.g. either someone is trying to exploit the program or there is a bug leading to an incorrectly formed path) . I, for one, would hope for an error in this situation.
Although not proposed here, it seemed to be suggested in the GHC ticket implicitly
I didn't mean to imply this. I honestly have not thought enough about the potential use-cases for this sort of path in filepath
to offer a suggestion one way or the other. Your reasoning doesn't sound unreasonable; it's hard to imagine a down-side to allowing such filepaths to be represented, so long as they are rejected as IO targets.
Further: to get a +1 from me, I also expect a working patch to file-io, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io
I would be happy to offer such a patch but I think there is a procedural question to be resolved here. It has been my understanding that CLC proposals are intended to decide on interface design; the question of "who" or "how" a particular change is to implemented seems rather orthogonal to this. It seems a bit odd for the committee to predicate their vote on matters that seem rather outside the stated mission of the proposal process. I think we should be clear about the scope here.
It seems a bit odd for the committee to predicate their vote on matters that seem rather outside the stated mission of the proposal process. I think we should be clear about the scope here.
I'm not saying you have to write this patch, but I want to avoid unnecessary surprises. This is part of the ecosystem impact. file-io can't be out of sync with base. And it depends on unix and Win32. I haven't checked if the implementation there is trivial.
I would be happy to offer such a patch but I think there is a procedural question to be resolved here. It has been my understanding that CLC proposals are intended to decide on interface design
CLC proposals must have a patch before being accepted. That is being clarified in https://github.com/haskell/core-libraries-committee/pull/137
I have added a final section suggesting that the CLC consider whether this change should be backported to stable GHC releases. Specifically, it is possible that this flavour of issue fits under CAPEC-52, which has been observed in numerous CVEs in the past and has been stated to have a "high" typical severity.
Consequently, I would appreciate it if we could make progress towards a decision on this proposal.
I have added a final section suggesting that the CLC consider whether this change should be backported to stable GHC releases.
I'd leave that up to GHC developers.
@Bodigrim, @chshersh, @mixphix, @angerman, @parsonsmatt, @tomjaguarpaw, do you have any thoughts on this? It would be great to move this along so we can begin backporting for the coming 9.4.5 release.
The merge request looks fine to me. If it's not supposed to happen anyway, flagging it here would only be more helpful in finding bugs elsewhere.
This seems sensible, but it's rather outside my area of expertise.
this issue may result in exploitable behavior in some server applications
What's exploitable about this?
I'm okay with this change and happy with the MR. Throwing an exception instead of silently truncated the file path sounds better to me. I also appreciate consistency with other languages.
What's exploitable about this?
Software may not show the NUL byte in a visible manner and not truncate. Users could be coerced into accepting faulty input for destructive file operations.
Similarly, a backend, where the developers don't anticipate the current behavior, may have a whitelist mechanism on directories that can be easily circumvented by a NUL-containing filepath.
Further: to get a +1 from me, I also expect a working patch to
file-io
, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io
@hasufell what's your conclusion on this? Does file-io
need a patch under this proposal?
Further: to get a +1 from me, I also expect a working patch to
file-io
, which is the successor to the base API that is proposed (using the new OsPath): https://hackage.haskell.org/package/file-io@hasufell what's your conclusion on this? Does
file-io
need a patch under this proposal?
In fact, both unix and Win32 do, so IMO, this needs to by synchronized with the base patch for the targeted GHC release. Otherwise we are inconsistent.
I've already set up a PR for unix: https://github.com/haskell/unix/pull/279
I'll create a PR for Win32 later.
file-io will then need to set up tighter bounds. Since file-io already has very high lower bounds on unix and Win32, there won't be any point in backporting this to older unix/Win32 branches (at least from file-io perspective).
@hasufell I take it that you are content with the proposal?
@hasufell I take it that you are content with the proposal?
Given that we coordinate the release of all the boot libraries properly, so that all major APIs are fixed at the same time.
We can certainly coordinate GHC 9.6.2 + unix-2.8.2.0
+ Win32-2.13.4.0
. Not sure what's the best course of actions if it gets backported to GHC 9.4.4 or earlier: one would have to patch previous major releases of unix
and Win32
. Not impossible, but tedious.
@frasertweedale is this something of interest for Haskell Security Response Team?
I've left some suggestions at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10110. Once @bgamari replies, we can proceed with a vote.
There are might be some technical tweaks to https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10110, but overall it is ready for a vote.
Dear CLC members, let's vote on the proposal to reject FilePath
s containing interior NUL chars or bytes. Not rejecting them is a security vulnerability, as explained in https://github.com/haskell/core-libraries-committee/issues/144#issuecomment-1482624286.
(To be clear: this is not a vote to approve backporting to earlier GHC releases, just the proposal per se)
@tomjaguarpaw @chshersh @mixphix @angerman @parsonsmatt @hasufell
+1 from me, seems like a serious vulnerability.
+1
+1
@frasertweedale is this something of interest for Haskell Security Response Team?
Thank you for the heads-up. SRT should accept advisories for any concrete vulnerability in a Haskell package (library or program) due to the current behaviour. That includes user confusion/deception scenarios.
Whether the current behaviour (acceptance of FilePath
s with interior NULs) warrants an advisory against base / file-io / unix is a point for the SRT (which will soon be established) to discuss. If anyone knows of prior advisories (e.g. CVE) for similar behaviour in other languages/libraries, please share.
In my opinion this is not a CVE. It's a CWE and already exists: https://cwe.mitre.org/data/definitions/158.html
This is not exploitable without further context. Programs today can already mitigate it by using isValid
on untrusted filepaths.
Moving the mitigation into base/unix/Win32 makes it harder to trigger it. But the primitives still exist, namely newCWString
and still don't do any input validation.
+1
+1
I have realized that the current semantics of {new,with}CStringLen
make it difficult to robustly determine whether the encoded form of a string contains internal NULs. Specifically, these functions do not add a final NUL-terminator to their output.
I have proposed that we introduce variants of these operations which do NUL terminate their output in #153.
Thanks all, 5 votes out of possible 7 are enough to approve the proposal to reject FilePath
s containing interior NUL
s.
Are we in agreement here about the coordination of the patches and release management of unix+Win32? Who will manage this?
I would also like to echo @hasufell's question. We are trying to release GHC 9.4 and would like guidance on which backports should be carried out.
I'm afraid I'm not up to an urgent release of unix-2.7
in time for GHC 9.4.5.
Given that https://gitlab.haskell.org/ghc/ghc/-/issues/13660 was open for 5 years and nobody cared much, I would not complicate matters with backports to GHC 9.4. As @hasufell says, clients should have used isValid
from filepath
anyways. I'm happy to be overruled if anyone feels strongly otherwise.
I'm fine with either way.
In GHC #13660 it was pointed out that GHC's file IO operations will currently accept file paths containing interior NULs (e.g.
"hello\0world"
) with surprising results.POSIX explicitly prohibits file paths containing interior NULs and for this reason many languages (e.g. Python, Java, Go, recenet PHP) currently reject such paths with an "invalid argument" error. I suggest that we do the same. Specifically, we should check the encoded form of the filepath for interior NULs and throw
IOError {ioe_type=InvalidArgument}
when such a path is found.Similarly, we propose that Windows rejects NUL codepoints in
FilePath
s.Implementation notes
Concretely, the relevant checks would be added in
System.Posix.Internals.{new,with}FilePath
, which are used by all IO operations on Windows and POSIX to encodeFilePath
s to buffers to pass to the operating system. The operations affected are primarily those implemented in terms ofGHC.IO.FD.openFileWith
(e.g.readFile
,writeFile
,openFile
,appendFile
,withFile
) although there are others (e.g.System.Environment.ExecutablePath.realpath
).I have posted a preliminary merge request carrying out this change.
Backporting
As this issue may result in exploitable behavior in some server applications, we suggest that the CLC should consider applying this change retroactively in GHC 9.2, 9.4, and 9.6.