Closed Midnighter closed 2 years ago
I realised today when looking at this issue, this didn't work very nicely when doing run merging, so I came up with another solution using seqkit (and it gave me an excuse to use try out NXF stdout
output type)
Actually the read length is for the database building, which is not something we support currently, so this is not necessary. A user will need to estimate already and build their corresponding database with that read length in mind, and pass that database to the pipeline.
My bad about this, yes, one needs defined read lengths for building the Bracken database and can then select one length in the analysis step.
Is the braken database same as kraken database, still confused. I have built my custom kraken database, is this what i will input to Braken ?
Not exactly the same, there is an additional build step that Bracken needs to perform on an existing kraken2 database. When it's done you can use that database for both kraken2 and Bracken. Take a look at this pipeline to build one:
https://github.com/Midnighter/kraken2-bracken-test-db
It would need some adjustments to be generally useful. This just builds the test database.
I have built my kraken2 custom database sucessfully and have the kraken.report from the sample i am testing. Is not that the same database we handle its link to bracken?
Okay , i ran braken-build too but still get same error
bracken-build -d kraken.NCBI.nt -t 32 -l 150 -x kraken.NCBI.nt -y kraken2
bracken -d kraken.NCBI.nt -i kraken.report -o test.bracken -w OUTREPORT -r 150 -l S
**Note that this script will try to use kraken2 as default. If kraken2 is not installed, kraken will be used instead
Checking for Valid Options... ERROR: kraken.NCBI.nt/database150mers.kmer_distrib does not exist Run bracken-build to generate the kmer distribution file.
Let's take this on Slack or in another issue. Your problem is not related to the original issue here.
Description of feature
Bracken needs as an input the read length. This is not consistently or always correctly reported in SRA meta-data so it's better to estimate the average read length.
fastp
already does this so we can extract the information from there.