Extract average read length from fastp output

nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data

https://nf-co.re/taxprofiler

MIT License

127 stars 34 forks source link

Extract average read length from fastp output #17

Closed Midnighter closed 2 years ago

Midnighter commented 2 years ago

Description of feature

Bracken needs as an input the read length. This is not consistently or always correctly reported in SRA meta-data so it's better to estimate the average read length. fastp already does this so we can extract the information from there.

jfy133 commented 2 years ago

I realised today when looking at this issue, this didn't work very nicely when doing run merging, so I came up with another solution using seqkit (and it gave me an excuse to use try out NXF stdout output type)

jfy133 commented 2 years ago

Actually the read length is for the database building, which is not something we support currently, so this is not necessary. A user will need to estimate already and build their corresponding database with that read length in mind, and pass that database to the pipeline.

Midnighter commented 2 years ago

My bad about this, yes, one needs defined read lengths for building the Bracken database and can then select one length in the analysis step.

Sabrin2020 commented 1 year ago

Is the braken database same as kraken database, still confused. I have built my custom kraken database, is this what i will input to Braken ?

Midnighter commented 1 year ago

Not exactly the same, there is an additional build step that Bracken needs to perform on an existing kraken2 database. When it's done you can use that database for both kraken2 and Bracken. Take a look at this pipeline to build one:

https://github.com/Midnighter/kraken2-bracken-test-db

It would need some adjustments to be generally useful. This just builds the test database.

Sabrin2020 commented 1 year ago

I have built my kraken2 custom database sucessfully and have the kraken.report from the sample i am testing. Is not that the same database we handle its link to bracken?

Sabrin2020 commented 1 year ago

Okay , i ran braken-build too but still get same error

bracken-build -d kraken.NCBI.nt -t 32 -l 150 -x kraken.NCBI.nt -y kraken2

bracken -d kraken.NCBI.nt -i kraken.report -o test.bracken -w OUTREPORT -r 150 -l S

**Note that this script will try to use kraken2 as default. If kraken2 is not installed, kraken will be used instead

Checking for Valid Options... ERROR: kraken.NCBI.nt/database150mers.kmer_distrib does not exist Run bracken-build to generate the kmer distribution file.

Midnighter commented 1 year ago

Let's take this on Slack or in another issue. Your problem is not related to the original issue here.