merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

anvio-import-taxonomy centrifuge parser requires hardcoded filenames #376

Closed tanaes closed 7 years ago

tanaes commented 8 years ago

This currently requires the filenames centrifuge_hits.tsv and centrifuge_reports.tsv. Would be preferable to do something that allowed arbitrary file paths, as in most of the other parsers, for example interproscan parser: https://github.com/meren/anvio/blob/master/anvio/parsers/interproscan.py#L19

meren commented 8 years ago

Hi Jon,

Thank you very much for the report. Yes, I really dislike the current implementation, but this was the only way out I could find to make the parser module generic enough. InterProScan is easy, because the number of expected files is one. But when there are multiple input files for a parser that differ in their nature functionally, it becomes harder to accept them with arbitrary names. Either there must be more parameters specific for each parser (i.e., in the case of centrifuge parser two extra params to separate hits file from reports file), in which case the client that utilizes the parser module should allow inclusion of parameters from a module it imports (a new set of headache because those parameters can conflict with system-wide parameters), or I have to set a fixed set of names for expected files :( I went with the second option, because it was much less painful.

I will keep this issue in mind, and try to find a more elegant way to handle this. Please let me know if you have an alternative suggestion.

Thanks again,

tanaes commented 8 years ago

So I'm no expert :P but it seems to me that this is a case of over-generalization of the import argument. Because the parses expect filetypes and file numbers that are really not in any way comparable, perhaps using the single '-i' import argument is misleading?

What if you instead passed a **kwargs, and required long-format argument names that were specific to the parser? You could still check those using the files_expected dict, but not have to worry about generic parameter name conflicts as you're only importing from your own parser definitions.

I might be missing something, but that was my first thought...

Cheers, -j

meren commented 8 years ago

Yes, a single -i makes things go south. I will think about improving it as soon as possible. I apologize for the inconvenience until then :) Who knows, maybe @ozcanesen would like the challenge ;)

Thank you!

epruesse commented 7 years ago

+1

meren commented 7 years ago

oh crap. you guys will make me fix my crappy code :(

tanaes commented 7 years ago

Oh no code zombies!!!

meren commented 7 years ago

Literally :(

epruesse commented 7 years ago

I think it'd suffice if you didn't check whether the filenames match before going for the import. Can you roll back the import if it fails?