mirage / ocaml-tar

Pure OCaml library to read and write tar files
ISC License
54 stars 34 forks source link

Add eio backend for tar #132

Closed patricoferris closed 1 year ago

patricoferris commented 1 year ago

This is a draft PR to propose an Eio-backend for Tar based completely on the lwt-unix library.

For the ease and simplicity this reuses the functors provided by Tar along with the type 'a t = 'a monad (an approach similar to that in https://github.com/mirage/ocaml-cohttp/pull/984).

We probably need some tests, but wanted to see if people were keen or not first :))

patricoferris commented 1 year ago

Just noticed https://github.com/mirage/ocaml-tar/pull/127 perhaps it would be best to wait to see what happens there first.

reynir commented 1 year ago

Thanks! Indeed, #127 will break this. In https://github.com/mirage/ocaml-tar/issues/107 I describe the motivation for some of the changes, in particular to Tar_unix. An issue is the skip function which we can either implement by reading and discarding bytes, or using Unix.lseek - but as far as I can tell we can't tell if a file descriptor is seekable, so we have to resort to the inefficient implementation. This is OK for skipping padding, which we do internally in the library, but is usually not ok when skipping potentially large files.

I have not used eio so I can't review it at the moment.

avsm commented 1 year ago

An issue is the skip function which we can either implement by reading and discarding bytes, or using Unix.lseek - but as far as I can tell we can't tell if a file descriptor is seekable, so we have to resort to the inefficient implementation.

The supplier of the file descriptor will know if it's seekable or not (no if its a network stream, and yes if its a file). We shouldn't have to resort to always using the inefficient implementation in this library.

I'm happy for this to go in soon, and be fixed up after #127, if both @reynir and @patricoferris are ok with that...

patricoferris commented 1 year ago

if both @reynir and @patricoferris are ok with that...

Works for me :)) I needed this for the port of OBuilder to Eio (if I have a more up to date port locally) which is basically the only dependency with Lwt.

reynir commented 1 year ago

An issue is the skip function which we can either implement by reading and discarding bytes, or using Unix.lseek - but as far as I can tell we can't tell if a file descriptor is seekable, so we have to resort to the inefficient implementation.

The supplier of the file descriptor will know if it's seekable or not (no if its a network stream, and yes if its a file). We shouldn't have to resort to always using the inefficient implementation in this library. That's true. How do we communicate this to the bits of the tar library that needs to know this? Mostly skip is used to skip the zero padding (at most 511 bytes) which is probably not a big problem to skip inefficiently. However, in Tar.Make.Archive.list skip is used to skip the files, too.

My proposal in #107 and #127 is to remove Tar.Make and Tar.Make.Archive and thus the list function. Then we sidestep the issue of using seek or read for skipping and don't need Tar_unix as well as Tar_unix_seekable or similar.

I'm happy for this to go in soon, and be fixed up after #127, if both @reynir and @patricoferris are ok with that...

If the EIO code looks fine I'm happy to merge and release this as tar.2.6.0 for example.

talex5 commented 1 year ago

Looks fine to me (diffing eio/tar_eio.ml against unix/tar_lwt_unix.ml the changes all seem straight-forward).

reynir commented 1 year ago

Thanks!

reynir commented 1 year ago

Hi @patricoferris, thanks again for the PR. I am working on cutting a 2.6.0 release with this PR and #133 included.

I am observing the following compilation error:

# File "eio/tar_eio.mli", line 23, characters 96-111:
# 23 | val get_next_header : ?level:Tar.Header.compatibility -> global:Tar.Header.Extended.t option -> Eio.Flow.source ->
#                                                                                                      ^^^^^^^^^^^^^^^
# Error: The type constructor Eio.Flow.source expects 1 argument(s),
#        but is here applied to 0 argument(s)

Could you please advice me either how to fix this and what updated version constraints to use or what upper bound I should put on eio. I hope this is easy to answer. Thanks in advance.

talex5 commented 1 year ago

Haven't tested it, but this builds for me on Eio 0.12: https://github.com/mirage/ocaml-tar/compare/main...talex5:ocaml-tar:eio-0.12?expand=1

hannesm commented 9 months ago

There's an issue with this PR, it let the OCaml-CI to only test OCaml 5.x, and no more testing OCaml 4. I don't quite understand why this is the case (and e.g. in the mirage-crypto repository we have similarly a "mirage-crypto-rng-eio" opam package (requiring ocaml 5), but the other opam packages are tested with OCaml 4). Any hints? It would be nice to get the OCaml 4 CI back.

talex5 commented 9 months ago

If the eio package had an explicit dependency on ocaml >= 5 then ocaml-ci wouldn't try to solve for it on older compilers, which should let it find a solution for the other packages.