mirage / ocaml-tar

Pure OCaml library to read and write tar files
ISC License
54 stars 34 forks source link

Document probable bugs with tar_lwt_unix #152

Open jonahbeckford opened 1 month ago

jonahbeckford commented 1 month ago

I added in some test cases because I have not been able to get ocaml-tar working after version 3. (It was working in version 2, although the API was signficantly different so difficult to compare code directly).

The first test (A) is the simplest ... it just calls Tar_lwt_unix.extract ~filter:test_filter ~src:filename "extracted/" but even that fails. See targz.t for the captured errors and a description of the other two more complicated tests. (I ran on a Windows machine, but I've seen failures on Linux + macOS).

NOTE: There will be different failures on ocaml-ci because they run in a sandbox ... I didn't want to check in a 4 MB .tar.gz so it is downloaded during the tests. So, please test on a desktop with dune build -p tar,tar-unix '@runtest' using no sandboxes.

reynir commented 1 month ago

Thanks! This is very helpful. I've identified one bug which made a wrong assumption about Tar.Really_read len:

diff --git a/unix/tar_lwt_unix.ml b/unix/tar_lwt_unix.ml
index cb7bcb5e7..5da2a8938 100644
--- a/unix/tar_lwt_unix.ml
+++ b/unix/tar_lwt_unix.ml
@@ -43,7 +43,9 @@ let safe f a =
 let read_complete fd buf len =
   let open Lwt_result.Infix in
   let rec loop offset =
+    assert (Bytes.length buf >= len);
     if offset < len then
+      let () = assert (Bytes.length buf > offset) in
       safe (Lwt_unix.read fd buf offset) (len - offset) >>= fun read ->
       if read = 0 then
         Lwt.return (Error `Unexpected_end_of_file)
@@ -96,7 +98,7 @@ let run t fd =
         Lwt_result.return (Bytes.sub_string b 0 read)
     | Tar.Really_read len ->
       let buf = Bytes.make len '\000' in
-      read_complete fd buf Tar.Header.length >|= fun () ->
+      read_complete fd buf len >|= fun () ->
       Bytes.unsafe_to_string buf
     | Tar.Seek len -> seek fd len
     | Tar.Return value -> Lwt.return value

Next I get different errors:

cram diff ```diff File "test/targz.t", line 1, characters 0-0: diff --git a/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t b/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t.corrected index acf0d4236..adbb17f85 100644 --- a/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t +++ b/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t.corrected @@ -22,14 +22,8 @@ _ in regress_targz.ml:fold works when `decompress = Fun.id` A. "Extract"-ing it in OCaml. That is, TestExtract.do_test $ OCAMLRUNPARAM=b ./regress_targz.exe v1044.tar extract - Fatal error: exception Invalid_argument("Lwt_unix.read") - Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45 - Called from Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 38, characters 15-18 - Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14 - Re-raised at Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 41, characters 13-26 - Called from Tar_lwt_unix.read_complete.loop in file "unix/tar_lwt_unix.ml", line 47, characters 6-55 - Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 99, characters 6-44 - Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 105, characters 6-11 + Fatal error: exception Failure("Could not find entry. Error error No such file or directory in function open extracted/openapi-1044/.github/workflows/publish.yml") + Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33 Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19 Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29 Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20 @@ -43,14 +37,8 @@ That is, TestUntar.do_test ~gunzip:false where ... _ (fun () -> Tar_lwt_unix.run (decompress (Tar.fold f init)) fd) in regress_targz.ml:fold works when `decompress = Fun.id` $ OCAMLRUNPARAM=b ./regress_targz.exe v1044.tar untar - Fatal error: exception Invalid_argument("Lwt_unix.read") - Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45 - Called from Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 38, characters 15-18 - Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14 - Re-raised at Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 41, characters 13-26 - Called from Tar_lwt_unix.read_complete.loop in file "unix/tar_lwt_unix.ml", line 47, characters 6-55 - Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 99, characters 6-44 - Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 105, characters 6-11 + Fatal error: exception Failure("Could not find entry. unmarshal Int64.of_string: failed to parse int64 \"0o0erated c\"") + Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33 Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19 Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29 Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20 ```
reynir commented 1 month ago

Ok, with the above fix and the fixes to the tests (you always need to either read or skip files; there's no early exit) there is one bug in extract:

hannesm commented 1 month ago

did the implicit creation of directories change between tar 2 and tar 3? I guess from a usability point of view ot makes sense to support it, and create these directories.

jonahbeckford commented 1 month ago

even with the above, implicit directories (that is, files with e.g. path implicit-dir/hello.txt where implicit-dir doesn't have an entry in the archive) are not created either

did the implicit creation of directories change between tar 2 and tar 3?

By the way: I actually don't care about implicit creation for my use cases. I didn't use extract or its equivalent in tar 2 ... I only put extract into the regression test so I could have a simple-to-understand test case.


Thanks for getting this fixed so quickly!

reynir commented 1 month ago

A quick look at tar 2.6 it seems extract never created directories. I think it is unfortunate that we don't, but as such it's not a regression.