nf-core / createtaxdb

Parallelised and automated construction of metagenomic classifier databases of different tools
https://nf-co.re/createtaxdb
MIT License
7 stars 4 forks source link

Documentation outdated #41

Closed ChillarAnand closed 3 months ago

ChillarAnand commented 3 months ago

Description of the bug

Docs show that input file should have 3 columns. https://nf-co.re/createtaxdb/dev/docs/usage/#full-samplesheet

However, when the workflow is run, it is failing with validation error.

https://github.com/nf-core/createtaxdb/issues/40#issuecomment-2218000081

Command used and terminal output

No response

Relevant files

No response

System information

No response

jfy133 commented 3 months ago

Thanks @ChillarAnand !

The pipeline is not yet released and is only barely just in the alpha stage, thus there is no documentation yet - I'm not 100% set on the input structure etc. What you prefer to is boilerplate code.

I'm still on parental leave until September, but after this development will accelerate

ChillarAnand commented 3 months ago

Understood, @jfy133

Thanks for your efforts to streamline db building.

Most of the time fasta files are downloaded from refseq or any other source which has tax id.

It would be great if pipeline can auto detect tax id from file instead of asking users to construct separate file.

jfy133 commented 3 months ago

Understood, @jfy133

Thanks for your efforts to streamline db building.

Most of the time fasta files are downloaded from refseq or any other source which has tax id.

It would be great if pipeline can auto detect tax id from file instead of asking users to construct separate file.

You're welcome and thank you for your patience!

I wondered about that but it's a huge can of worms because of highly inconsistent ways of formatting fasta headers. Not every genome will come from NCBI, and sometimes not even this is consistent.

My current plan is to provide documentation on how to use other tools to get such information, such as taxonkit or entrez to retrieve such information :)

jfy133 commented 3 months ago

Also note @ChillarAnand that I replied on the other issue describing how to (currently) format the tsv :)

ChillarAnand commented 3 months ago

Is it possible to update the databases incrementally?

jfy133 commented 3 months ago

Is it possible to update the databases incrementally?

Essentially no. Very few of the tools support this sadly, so I've not currently considered this for those that do.