haskell / filepath

Haskell FilePath core library
BSD 3-Clause "New" or "Revised" License
66 stars 33 forks source link

How to read and write OsPaths without interpreting them? #233

Open jefdaj opened 3 weeks ago

jefdaj commented 3 weeks ago

I'm trying to read and write lists of OsPaths (actually just PosixPaths in case that matters) to files. I want to avoid doing any conversion or interpretation if possible---just treat the paths as opaque bytestrings separated by \NUL.

I see that I could use encodeFS and decodeFS, but 1) that's incompatible with Attoparsec (annoyingly, the Parser monad isn't a transformer), 2) it forces IO into a lot of otherwise pure code, and 3) the extra round-trip seems more likely to introduce encoding bugs than prevent them.

I'm about to try breaking into the hidden modules and using the raw constructors. But is there a more recommended way to read/write PosixPaths?

One idea that comes to mind is adding a Binary/Bytable instance? I haven't looked into that before. But a trivial instance that just wraps/unwraps the constructor seems like it would be equivalent to exposing the constructor itself.

Edit: also, thanks for taking on this OsPath thing! I'm not well versed in low level encodings and am glad someone is working on it. I would offer to help to the extent I can without breaking anything. I'm working on Arbitrary instances to check that my code can round-trip trees of OsPaths to folders on disk. Maybe a version of those could end up in the library and help identify bugs?

jefdaj commented 3 weeks ago

Of course after posting this, I finally noticed you can access the raw constructors in the OsString package! Is that what I should be doing?

hasufell commented 2 weeks ago

From what I understand you want to write filepaths to a file on disk?

Indeed I would avoid decodeFS. How to access the raw bytes in a cross platform manner is described here: https://hasufell.github.io/posts/2022-06-29-fixing-haskell-filepaths.html#accessing-the-raw-bytes-in-a-cross-platform-manner

I haven't looked into that before. But a trivial instance that just wraps/unwraps the constructor seems like it would be equivalent to exposing the constructor itself.

The problem is that we are dealing with wide char array on windows ([Word16]) as opposed to char array on unix ([Word8]). So you'd still somehow need to encode the platform information (maybe as a magic bit?) for OsPath. Binary instances for PosixPath and WindowsPath are indeed trivial. So if you're just dealing with PosixPath, you can unwrap the underlying ShortByteString and turn it into a ByteString.

Wrt attoparsec, also see https://github.com/haskell/attoparsec/issues/225

My idea was to provide a way to convert to Data.Bytes.Bytes (which is a sliceable type) and then use that for efficient parsing. But we still have the problem that on Windows we are dealing with wide char arrays.

Of course after posting this, I finally noticed you can access the raw constructors in the OsString package! Is that what I should be doing?

Yes