natverse / nat

NeuroAnatomy Toolbox: An R package for the (3D) visualisation and analysis of biological image data, especially tracings of single neurons.
https://natverse.org/nat/
64 stars 28 forks source link

Zip command line character limit error fix #452

Closed jonmarty closed 3 years ago

jonmarty commented 3 years ago

The following change resolves an issue in which a sufficiently long neuronlist would lead to a large number of files being passed into the zip command, thus exceeding the command line character limit when the 'zip' command is called, causing the zip function to error out.

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.003%) to 81.315% when pulling e220a91ddecc0a9c316651abedeed6ed8c273992 on jonmarty:patch-2 into 42ca5448fe3864e397070fc2de9a374f13a85af8 on natverse:master.

jonmarty commented 3 years ago

Hello Greg,

I did run into that issue. I was attempting to store the all the neurons in the FCWB brain (about 19,000) in a zip file -- which is probably a pretty extreme case. While the code does successfully write all the .swc files to a 'dir', it does not then create 'dir.zip' before deleting 'dir'. I wrote all the .swc files to a directory and then tried running the zip function on its own, which gave no print output and created no files for the full FCWB brain (stored in the 'dps' neuronlist)

dir = 'dps' zip_file = 'dps.zip' zip(zip_file, files=dir(dir, recursive=TRUE, full.names=TRUE))

And for a smaller neuronlist

dir='skel.fc' zip_file='skel.fc.zip' zip(zip_file, files=dir(dir, recursive=TRUE, full.names=TRUE)) adding: skel.fc/1664980698.swc (deflated 53%) adding: skel.fc/1884625521.swc (deflated 55%) adding: skel.fc/1975347348.swc (deflated 53%) adding: skel.fc/2007068523.swc (deflated 53%) adding: skel.fc/2065745704.swc (deflated 54%) adding: skel.fc/2068801704.swc (deflated 53%) adding: skel.fc/264083994.swc (deflated 55%) adding: skel.fc/296544364.swc (deflated 55%) adding: skel.fc/324846570.swc (deflated 54%) adding: skel.fc/325529237.swc (deflated 55%) adding: skel.fc/356818551.swc (deflated 55%) adding: skel.fc/386834269.swc (deflated 55%) adding: skel.fc/387166379.swc (deflated 55%) adding: skel.fc/387944118.swc (deflated 54%) adding: skel.fc/448260940.swc (deflated 55%) adding: skel.fc/450034902.swc (deflated 55%) adding: skel.fc/480029788.swc (deflated 55%) adding: skel.fc/511051477.swc (deflated 55%) adding: skel.fc/5813001741.swc (deflated 55%) adding: skel.fc/5813010153.swc (deflated 55%) adding: skel.fc/5813021192.swc (deflated 55%) adding: skel.fc/5813022274.swc (deflated 55%) adding: skel.fc/5813026773.swc (deflated 55%) adding: skel.fc/5813056917.swc (deflated 55%) adding: skel.fc/5813064789.swc (deflated 55%) adding: skel.fc/5813069648.swc (deflated 55%) adding: skel.fc/5813071319.swc (deflated 55%)

Also, since the zip function uses the '-r' argument on the command line, the changed line should have the same effect as the previous one.

Thanks, Jonathan

jefferis commented 3 years ago

Thanks for the additional info @jonmarty!

To clarify my point about sort order, when you pass a directory name to zip, the files are added in the lexographical sort order recursing through the directory tree. Whereas when you pass in a file list, then they are added in that specific order.

I still need to think about exactly why it is failing in your case. It could be that it's actually related to the zip utility's handling of many files when explicitly specified rather than a problem with argument length since that should already be fixed by the R zip function. Could I ask which OS and R version you are on? And what does

> Sys.getenv("R_ZIPCMD")
[1] "/usr/bin/zip"
> system2(Sys.getenv("R_ZIPCMD"), args = '-h')
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete OS files)
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation/print version info
  -c   add one-line comments        -z   add zipfile comment
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
  -T   test zipfile integrity       -X   eXclude eXtra file attributes
  -y   store symbolic links as the link instead of the referenced file
  -e   encrypt                      -n   don't compress these suffixes
  -h2  show more help

give for you.

jefferis commented 3 years ago

Dear @jonmarty, could I just ping you re my questions above? I haven't been able to reproduce your issue on my machine, so I wonder if it might be an OS interaction or possibly about path lengths. In addition, I've decided to switch behaviour to your fix when the neuronlist is greater than some threshold number of neurons. Do you have any idea when the problem became evident to you? 1000 neurons, 5000, 10000?

jonmarty commented 3 years ago

Sorry for the late response. I found that I did have an old version of zip, although updating it did not resolve the issue. I was having the issue mentioned above when I was saving all the neurons in the FCWB brain, which contains 16,000 neurons. I tested out a couple different numbers of neurons and found that the problem arises around 8,000 neurons (however, 5000 worked), with write.neurons erring with "sh: /usr/bin/zip: Argument list too long". I'm on mac OS Catalina 10.15.17 and use R version 3.4.1. The commands you specified give the following output:

> Sys.getenv("R_ZIPCMD")
[1] "/usr/bin/zip"
> system2(Sys.getenv("R_ZIPCMD"), args = '-h')
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete OS files)
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation/print version info
  -c   add one-line comments        -z   add zipfile comment
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
  -T   test zipfile integrity       -X   eXclude eXtra file attributes
  -y   store symbolic links as the link instead of the referenced file
  -e   encrypt                      -n   don't compress these suffixes
  -h2  show more help
jefferis commented 3 years ago

Dear @jonmarty, thanks so much for that additional information. The key point is that you are using R 3.4.1. I almost certain that your issue is resolved by this change

https://github.com/wch/r-source/blame/e767ea66c8684b897e07d174e90ef5ae0cdd2c87/src/library/utils/R/zip.R#L110

which was included in the R 3.6.0 release (see NEWS) in April 2019. I will add a small conditional and then merge.