Closed mslw closed 2 months ago
Looking at it again, the positional argument is called input-dir
and the docstring says "The input base directory itself is not put into the archive (i.e., its own name is irrelevant)." It is a little ambiguous (or is it?) what the base directory is (e.g., for foo/bar/baz
, is it foo
or baz
/ the entire thing)?
But I still think the current behavior is at least suboptimal. This is how I first noticed:
❱ tree ~/Documents/dicom-samples/dicomdir
/home/mszczepanik/Documents/dicom-samples/dicomdir
└── 01
└── MR000000.dcm
❱ python ~/Documents/inm-icf-utilities/bin/make_studyvisit_archive ~/Documents/dicom-samples/dicomdir -o /tmp/exports --id foo baz
❱ tar -tvf /tmp/exports/foo/baz_dicom.tar
-rw-r--r-- root/root 10823982 2024-04-26 12:40 foo_baz/mszczepanik/Documents/dicom-samples/dicomdir/01/MR000000.dcm
Initially I wrote:
This is fine if the DICOMs are in /tmp or any other top level directory
Not even that; I think it is only "fine" when relative paths are used. Notice that dicomdir
is included when using an absolute path /tmp/dicomdir/
:
❱ tree /tmp/dicomdir
/tmp/dicomdir
└── 01
└── MR000000.dcm
❱ python ~/Documents/inm-icf-utilities/bin/make_studyvisit_archive /tmp/dicomdir -o /tmp/exports --id foo bar
❱ tar -tvf /tmp/exports/foo/bar_dicom.tar
-rw-r--r-- root/root 10823982 2024-04-26 12:40 foo_bar/dicomdir/01/MR000000.dcm
compare to a relative path:
❱ cd /tmp
❱ python ~/Documents/inm-icf-utilities/bin/make_studyvisit_archive dicomdir -o /tmp/exports --id foo bar2
❱ tar -tvf exports/foo/bar2_dicom.tar
-rw-r--r-- root/root 10823982 2024-04-26 12:40 foo_bar2/01/MR000000.dcm
I think the top-level directory in the tarball can be set as it is now, but all paths inside should be relative to the <input-dir>
(a bit like tar -C
, just with one extra directory). This will, however, be a behavior change for all but one-level paths provided as <input-dir>
, as shown above.
One potential downside I can think of is removing some flexibility (e.g. what if I have /tmp/datastore/scans/dicomdir[1-10]/...
and want only dicomdir3
but I also want paths to include datastore/scans/
) - but I think the increased consistency is more important.
I @mslw, thanks for catching this. I worked on a PR, because I did not realize you have already opened one. If it is OK with you, I would add the regression test of my PR to your PR.
The docstring for
make_studyvisit_archive
says:Yet, the path within tarfile is generated as follows:
https://github.com/psychoinformatics-de/inm-icf-utilities/blob/38b5783cdb7b0e1c50077eaa469819a61108090e/bin/make_studyvisit_archive#L97-L99
i.e. only the top level directory is stripped from
tinfo.name
. This is fine if the DICOMs are in/tmp
or any other top level directory, but if the input dir is elsewhere (e.g./home/me/test-data/something
) then all but the first component will make it into the path within tarfile. What we need is to take a path relative to the input dir and prepend one level.