Closed gravicappa closed 9 months ago
Thanks, I was not aware.
How do you think we should handle this? There's the Tar
module, but also the Tar_unix
and Tar_unix_lwt
.
For the latter two it makes sense to consider this when extracting. In #127 I'm reworking Tar_unix
etc.
For Tar
I'm unsure if we should do anything differently. When we unmarshal we could synthesize the link_indicator
to be Tar.Header.Link.Directory
. I'm not sure we should do this. We could do it in the header reader as we already synthesize other things (long names IIRC). What do you think?
I am curious if you know how to create such an archive, or if you have a (small) sample.
You can create such archives with bsdtar c --format v7 …
.
I would implement that link_indicator
fix in Tar.Header.unmarshal
either when hdr_version
(not level
though) is pre-ustar or unconditionally after synthesizing long names (to avoid being misdirected by trailing slash in cut-off name). In my code I just call
let sanitize_header (header: Tar.Header.t) =
match header.link_indicator with
| Normal ->
if path_is_directory header.file_name then
{ header with link_indicator = Directory }
else
header
| _ -> header
in Tar_unix.Archive.with_next_file
called proc and it generally works although I'm dealing with docker image layers archives only at the moment.
I was curious how gnu tar would treat a tar archive with a pax header setting the file name to clearly/a/directory/
followed by a header for an empty file placeholder
with normal link indicator. So I created that with ocaml-tar, and interestingly tar treats the contents as a directory clearly/a/directory/
:
$ tar -tvf pax-shenanigans.tar
dr-------- 0/0 0 1970-01-01 01:00 clearly/a/directory/
Here's the sample archive (gzipped so github accepts it) pax-shenanigans.tar.gz
I will make a PR shortly implementing your second proposal.
tar.2.5.1 has been submitted to opam-repository with a fix. Thank you for the report and the discussion on how to fix it.
It seems that I was wrong: the fix up for link_indicator
should be done in read_header
instead: file_name
is being updated there.
And I encountered archive (which unfortunately I can't provide) there LongLink
contains full name and file_name
is cut off right after /
.
This is how I fixed the issue locally (probably missed a couple of other issues):
diff --git a/lib/tar.ml b/lib/tar.ml
index 0ff1346..16de231 100644
--- a/lib/tar.ml
+++ b/lib/tar.ml
@@ -522,17 +522,7 @@ module Header = struct
let mod_time = match extended.Extended.mod_time with
| None -> get_hdr_mod_time c
| Some x -> x in
- let link_indicator =
- let link_indicator = get_hdr_link_indicator c in
- (* For backward compatibility we treat normal files ending in slash
- as directories. Because [Link.of_char] treats unrecognized link
- indicator values as normal files we check directly *)
- if String.length file_name > 0 && file_name.[String.length file_name - 1] = '/' &&
- (link_indicator = '0' || link_indicator = '\000') then
- Link.Directory
- else
- Link.of_char ~level link_indicator
- in
+ let link_indicator = Link.of_char ~level (get_hdr_link_indicator c) in
let uname = match extended.Extended.uname with
| None -> if ustar then get_hdr_uname c else ""
| Some x -> x in
@@ -718,6 +708,17 @@ module HeaderReader(Async: ASYNC)(Reader: READER with type 'a t = 'a Async.t) =
| None -> return (Error `Eof)
end in
+ let true_link_indicator link_indicator file_name =
+ (* For backward compatibility we treat normal files ending in slash
+ as directories. Because [Link.of_char] treats unrecognized link
+ indicator values as normal files we check directly *)
+ if String.length file_name > 0
+ && file_name.[String.length file_name - 1] = '/'
+ && link_indicator = Header.Link.Normal then
+ Header.Link.Directory
+ else
+ link_indicator in
+
let rec read_header (file_name, link_name, hdr) : (Header.t, [`Eof]) result Async.t =
let raw_link_indicator = Header.get_hdr_link_indicator buffer in
if (raw_link_indicator = 'K' || raw_link_indicator = 'L') && level = Header.GNU then
@@ -738,7 +739,8 @@ module HeaderReader(Async: ASYNC)(Reader: READER with type 'a t = 'a Async.t) =
else begin
let link_name = if link_name = "" then hdr.Header.link_name else link_name in
let file_name = if file_name = "" then hdr.Header.file_name else file_name in
- return (Ok {hdr with Header.link_name; file_name })
+ let link_indicator = true_link_indicator hdr.Header.link_indicator file_name in
+ return (Ok {hdr with Header.link_name; file_name; link_indicator })
end in
get_hdr ()
>>= function
reopening, thanks for your testing and reporting!
Dear @gravicappa, thank you for your report and for sharing your own fix, and sorry for not reacting earlier.
Unfortunately Link.of_char
returns Link.Normal
if the link indicator is unknown - so maybe a bit too much is treated as directories. Ideally, only '0'
and '\000'
would receive this treatment. I don't know how often these unknown-to-us link indicators occur in the wild, and I imagine the chances are slim that they end in a slash so your solution is probably acceptable. In #127 I will reconsider returning Link.Normal
on "unknown" link indicators.
A fix's released in 2.6.0. I will keep this issue open until I have a fix for the next
branch as well. Thanks again for your reports and suggestions!
Done in #127
As written here.
And I have encountered archives like that.