Closed Ge0rges closed 11 months ago
Perhaps a TSV would be a useful output , where the first column is the gene ID which would match back to the FASTQ sequence, and every column is another of the variables present in the dict? So final product would have one row per sequence in the FASTA file given.
Ok after discussion with @ivagljiva the output is now a TSV. I think this is good.
Hey @Ge0rges! Thank you very much for this PR! I added some 2 cents here and there :)
Please add yourself to the authors column of the program, too!
@meren incorporated the very logical feedback. Let me know if can do anything more.
Thank you very much for these changes, @Ge0rges! And apologies for the delayed responses here. The last few days have been hectic :) I am merging the PR now. Thank you again!
Looks nice :)
It would have been nice to update anvio/docs/programs/anvi-run-ncbi-cogs.md to mention that it can also be run on a FASTA file directly.
Yes! Sorry I'm still learning about all the different places Anvi'o stores information. Should I make another pull request?
No worries at all. Feel free to add to this one instead of making a new one. Completely up to you. If you don't want to do it, I can add the information there at some point, too :)
I didn't know I could contribute to a closed pull request. However, I have pushed the change to my fork.
Hmm. Then feel free to send another PR, @Ge0rges. Thank you!
Hello,
I wanted to allow
anvi-run-ncbi-cogs
to be given just an amino acid sequence from a FASTA file. I believe I've done this successfully.However this pull request requires still one major change which is the output. I was unsure what the best output format/options to offer would be and wanted to seek the advice of the anvi'o team.