rlabduke / MolProbity

Protein and nucleic acid validation service
Other
56 stars 30 forks source link

MolProbity supplies zip archives that are "tarbombs" #5

Closed smlewis closed 8 years ago

smlewis commented 8 years ago

Currently our zip archives unzip into multiple subdirectories within $cwd instead of only one. This is widely considered rude and we've had two requests to stop doing that.

The obvious solution - have zip silently insert an extra directory into the path while zipping - is apparently something zip won't do. The other obvious solution, pass the right flags while unzipping, isn't feasible for Windows / non-command-line-savvy users.

The probable fix: 1) put a symlink for the job directory (the ones like lwke265lkn13oky621209) to itself, named with a timestamp like 20160209_1241_molprobity or similar, then 2) zip through that symlink (cannot use * or recursive!) to ensure that the resulting zip archive will unzip into its own directory.

(The other purpose for this issue is to start playing with issues and pull requests in the MP context, copying from Rosetta behavior).

prisant commented 8 years ago

Steven,

Are these the relevant code pointers?:

function makeZipForFolder in lib/core.php

called by either:

pages/file_browser.php 292: $zipfile = makeZipForFiles($basedir, $files);

pages/welcome.php 892: $zipfile = makeZipForFiles($_SESSION['dataDir'], $files);

Is this sort of your strategy (and attached pdf print)?

http://superuser.com/questions/392569/linux-specifying-top-level-directory-when-creating-zip-archive

Michael

smlewis commented 8 years ago

@prisant (that's github markup but may look odd in email):

1) makeZipFor whatever; yes those functions appear to be the correct targets

2) yes, except I couldn't find that thread and hit upon the solution myself.

3) Including the github address (starts rlabduke/MolProbity, ends with a giant hash) duplicates your message to the web issue on github (good) and includes your signature (bad, since it's full of personal data). I scrubbed your signature from the message as it appears on github.

prisant commented 8 years ago

Thanks for scrubbing the personal information!

smlewis commented 8 years ago

So, the zip commands used in MP use -r (recursive), and they probably need to because it needs to be able to specify directory names and grab all the contents. This sinks the symlink-inside-the-data-directory I'd proposed, because it recursively zips and you get an infinitely large file.

I've got code working to prepend a new directory in branch sml_tarbomb as of 4c04c9c438e5050ab078d33dc91a9472ad73d380 (currently hardcoded as literal 'gobbledegook'), but the price is that the renaming symlink is OUTSIDE of the job directory. Simple timestamps would allow collisions between users trying to make zipfiles simultaneously. I guess I have to have the entire hash in the new name to prevent that, so the name would be like:

YYYYMMDD_HHMM_molprobity_il6p0vl083h01ps8k43b31g0q5

smlewis commented 8 years ago

Name changed to:

MolProbity_YYYYMMDD_HHMM_sessionID

per discussion in lab meeting