qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

File name too long #319

Closed sjanssen2 closed 1 year ago

sjanssen2 commented 3 years ago

I am abusing qiime2 to analyse some non coding RNA genes. One of the tasks is to test if beta diversity distances vary significantly - which I try to answer via qiime diversity beta-group-significance. One of the metadata fields are the secondary structures in dot bracket notation for the RNAs. The plugin tries to name some output files like the values in the metadata, here the secondary structure representations, which are ~565 characters long.

Thus, I get an error like

Plugin error from diversity:

  [Errno 36] File name too long: '/tmp/qiime2-temp-08td9oxj/_.................%5B%5B%5B%5B%5B%5B%5B%5B%5B%5B.%28%28%28%28%28........%28%28%28.%28%28%28%28%28%28%28%28%28.......%29%29%29%29%29.%29%29%29%29%29%29%29..%28%28%28%28%28........%29%29%29%29%29............%29%29%29%29%29.%28%28%28%28%28%28%28.%28%28%28%28%28%28%28.%28%28.%28%28......%28%28%28%28.%28%28........%29%29.%29%29%29%29.....%29%29.%29%29.%29%29%29%29%29%29%29%29%29%29%29%29%29%29.%7B%7B%7B%7B%7B%7B%7B%7B%7B............%28%28%28%28%28.%28%28%28%28%28%28%28%28%28%28..................%29%29%29%29%29%29%29%29%29%29.%29%29%29%29%29........%5D%5D%5D%5D%5D%5D%5D%5D%5D%5D...%3C%3C%3C%3C....%28%28%28%28%28%28%28%28%28%28%28...%28%28%28%28%28%28%28%28%28..%28%28%28%28%28........%29%29%29%29%29...%29%29%29%29........................%29%29%29%29%29%29%29%29%29%29%29%29%29%29%29%29.........%7D%7D%7D%7D%7D%7D%7D%7D%7D..%28%28%28%28%28..%28%28%28%28%28%28......%29%29%29%29%29%29...%29%29%29%29%29.....%3E%3E%3E%3E..%5B%5B.%28%28%28%28%28%28%28..........%29%29%29%29%29%29%29......%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%7B%5D%5D....%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D%7D..-boxplots.png'

Debug info has been saved to /tmp/qiime2-q2cli-err-t1q20mni.log

It would be great if we could find a mechanism to avoid this, e.g. check the filename length and if it exceeds a certain threshold use hash values instead of group names?

https://github.com/qiime2/q2-diversity/blob/e80b507ecb1c5a2a2ee1b44af3e3ae539ab675df/q2_diversity/_beta/_visualizer.py#L195

thermokarst commented 3 years ago

Wow, that is one nasty filename - congrats!

I like the idea of using a unique id for writing the filenames - I'll propose that we skip the threshold check and just do the unique id for everything - I don't think the actual filenames matter here, so keeping it simple might be helpful.

sjanssen2 commented 3 years ago

this is my bloody hack for now:

import hashlib
short_group_id = group_id if len(group_id) < 200 else hashlib.md5(str(group_id).encode()).hexdigest()
fig.savefig(os.path.join(output_dir, '%s-boxplots.png' %
                                 urllib.parse.quote(str(short_group_id))))

but I agree, we don't really need to preserve state names in the figure filenames, except someone is using the file names programmatically in downstream analysis - which he/she should be warned due to the term visualization (but I am actually doing the later)

ebolyen commented 3 years ago

This thread is wonderful. Sometimes I want to just pat a computer on the head and say "you did your best, buddy".

I'm good with opaque filenames. To be honest, our attempts to encode the variable has usually been an exercise in watching exciting character sequences wreak havoc for a while now. Giving up might be the best approach. If we wanted to be nice, we could encode the variable-to-filename map in something easy to parse (like a JSON script tag in index.html), that would of course be up to the visualization, as there's no rules there.

ebolyen commented 3 years ago

Maybe q2templates could provide some convenient functionality?

cherman2 commented 1 year ago

@ebolyen Does q2templates provide some convenient functionality?

ebolyen commented 1 year ago

Nope! Maybe it should, although I haven't seen the likes of this in a while.

cherman2 commented 1 year ago

I am going to close this issue. Lets re-open if users are still running into this!