Closed ChillarAnand closed 3 months ago
Thanks @ChillarAnand !
The pipeline is not yet released and is only barely just in the alpha stage, thus there is no documentation yet - I'm not 100% set on the input structure etc. What you prefer to is boilerplate code.
I'm still on parental leave until September, but after this development will accelerate
Understood, @jfy133
Thanks for your efforts to streamline db building.
Most of the time fasta files are downloaded from refseq or any other source which has tax id.
It would be great if pipeline can auto detect tax id from file instead of asking users to construct separate file.
Understood, @jfy133
Thanks for your efforts to streamline db building.
Most of the time fasta files are downloaded from refseq or any other source which has tax id.
It would be great if pipeline can auto detect tax id from file instead of asking users to construct separate file.
You're welcome and thank you for your patience!
I wondered about that but it's a huge can of worms because of highly inconsistent ways of formatting fasta headers. Not every genome will come from NCBI, and sometimes not even this is consistent.
My current plan is to provide documentation on how to use other tools to get such information, such as taxonkit or entrez to retrieve such information :)
Also note @ChillarAnand that I replied on the other issue describing how to (currently) format the tsv :)
Is it possible to update the databases incrementally?
Is it possible to update the databases incrementally?
Essentially no. Very few of the tools support this sadly, so I've not currently considered this for those that do.
Description of the bug
Docs show that input file should have 3 columns. https://nf-co.re/createtaxdb/dev/docs/usage/#full-samplesheet
However, when the workflow is run, it is failing with validation error.
https://github.com/nf-core/createtaxdb/issues/40#issuecomment-2218000081
Command used and terminal output
No response
Relevant files
No response
System information
No response