Closed tanaes closed 7 years ago
Hi Jon,
Thank you very much for the report. Yes, I really dislike the current implementation, but this was the only way out I could find to make the parser module generic enough. InterProScan is easy, because the number of expected files is one. But when there are multiple input files for a parser that differ in their nature functionally, it becomes harder to accept them with arbitrary names. Either there must be more parameters specific for each parser (i.e., in the case of centrifuge parser two extra params to separate hits file from reports file), in which case the client that utilizes the parser module should allow inclusion of parameters from a module it imports (a new set of headache because those parameters can conflict with system-wide parameters), or I have to set a fixed set of names for expected files :( I went with the second option, because it was much less painful.
I will keep this issue in mind, and try to find a more elegant way to handle this. Please let me know if you have an alternative suggestion.
Thanks again,
So I'm no expert :P but it seems to me that this is a case of over-generalization of the import argument. Because the parses expect filetypes and file numbers that are really not in any way comparable, perhaps using the single '-i' import argument is misleading?
What if you instead passed a **kwargs, and required long-format argument names that were specific to the parser? You could still check those using the files_expected dict, but not have to worry about generic parameter name conflicts as you're only importing from your own parser definitions.
I might be missing something, but that was my first thought...
Cheers, -j
Yes, a single -i
makes things go south. I will think about improving it as soon as possible. I apologize for the inconvenience until then :) Who knows, maybe @ozcanesen would like the challenge ;)
Thank you!
+1
oh crap. you guys will make me fix my crappy code :(
Oh no code zombies!!!
Literally :(
I think it'd suffice if you didn't check whether the filenames match before going for the import. Can you roll back the import if it fails?
This currently requires the filenames centrifuge_hits.tsv and centrifuge_reports.tsv. Would be preferable to do something that allowed arbitrary file paths, as in most of the other parsers, for example interproscan parser: https://github.com/meren/anvio/blob/master/anvio/parsers/interproscan.py#L19