wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
275 stars 30 forks source link

bwa index not found with -r #112

Closed blindner6 closed 2 years ago

blindner6 commented 2 years ago

Hi Ben,

I've been competitively mapping reads to a collection of ~7,000 draft genomes using bwa-mem2. Recently, I've tried switching to coverm to complete both mapping (still with bwa-mem) and depth calculation (seqdepth calculation is much faster than the ad hoc python scripts I was using previously) for these genomes.

My longterm aim is to run multiple metagenomes through coverm but since my genome set is so large, I want to avoid building a new index each time I start the process over with a new metagenome. So I wanted to offer coverm genome my bwa index via -r for each run. Based on reading the help documentation, I'm attempting to call coverm genome in this way:

coverm genome -1 ${fwd} -2 ${rev} -p bwa-mem -r ${index} -d ${g_dir} \
--min-read-percent-identity 95 --min-read-aligned-percent 75 --output-format sparse \
--trim-max 90 --trim-min 10 -m trimmed_mean -o ${output} -t 24

Where the variables follow: fwd/rev: forward and reverse gzipped FASTQ, index: path to a bwa-mem index, including the stem g_dir: path to directory containing genomes (suffix matches default .fna) output: what I want the output .tsv saved as

When I run this, coverm is not able to recognize my BWA index (passed with my index variable above) despite a lot of fiddling on my end. I see the following error before coverm exits:

thread 'main' panicked at 'The reference specified '/storage/coda1/p-ktk3/0/shared/rich_project_bio-konstantinidis/shared3/mst/01_pipeline/xx_references/xx_fulldbs/MST_db' does not appear to exist', src/mapping_parameters.rs:159:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Any thoughts or feedback would be greatly appreciated.

Cheers,

Blake

wwood commented 2 years ago

Hi Blake,

bwa-mem2 usage was put in the code (#72 ) but not as part of a release yet. Are you using 0.6.1 or newer? Thanks.

blindner6 commented 2 years ago

Hey Ben,

Thanks for your reply. Sorry for any confusion -- I'm not attempting to run bwa-mem2 within CoverM.

I am running CoverM v0.6.1. I've used bwa v0.7.17 to index my genome database. But CoverM won't accept that via -r (path and stem for the index files being provided are correct).

I have hundreds of metagenomes I want to map to thousands of MAGs, so I want to avoid building the index with each pass. I could map those myself and then handle the BAM files into CoverM but wanted to check here first if -r is indeed intended to allow me to provide my own bwa index (thus avoiding this). And if so, what I might be doing wrong above that is preventing me from doing so.

Thanks for your time!

wwood commented 2 years ago

Ah, OK.

My testing just below suggests coverm already picks up the stem of the DB as it is supposed to (this is for bwa-mem v0.7.17 / coverm 0.6.1 but the same held true for bwa-mem2 / coverm newest). Using 7seqs.fna from the tests/data directory.

(coverm-v0.6.1)cl5n008:20220506:~/git/coverm$ bwa index /tmp/7seqs.fna
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.01 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index /tmp/7seqs.fna
[main] Real time: 0.013 sec; CPU: 0.013 sec
(coverm-v0.6.1)cl5n008:20220506:~/git/coverm$ coverm contig -p bwa-mem -r /tmp/7seqs.fna --single tests/data/2seqs.fasta
[2022-05-05T19:37:07Z INFO  coverm] CoverM version 0.6.1
[2022-05-05T19:37:07Z INFO  coverm] Using min-covered-fraction 0%
[2022-05-05T19:37:07Z INFO  bird_tool_utils::external_command_checker] Found samtools version 1.15.1
[2022-05-05T19:37:07Z INFO  coverm::mapping_index_maintenance] BWA index appears to be complete, so going ahead and using it.
[2022-05-05T19:37:07Z INFO  coverm::contig] In sample '7seqs.fna/2seqs.fasta', found 2 reads mapped out of 2 total (100.00%)
Contig  7seqs.fna/2seqs.fasta Mean
genome1~random_sequence_length_11000    0
genome1~random_sequence_length_11010    0
genome2~seq1    1
genome3~random_sequence_length_11001    0
genome4~random_sequence_length_11002    0
genome5~seq2    1
genome6~random_sequence_length_11003    0

Are you using it in the above way? ben

fbeghini commented 2 years ago

Hi Ben, I've been encountering the same issue when using coverm with bwa (coverm version 0.6.1, bwa version 0.7.17). When using a pre-built bwa index and TMPDIR is set, coverm tries to build the index in the TMP directory without checking if the index is complete in the provided location.

[2022-06-13T15:45:11Z INFO coverm::mapping_index_maintenance] Generating BWA_MEM index for /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/representatives_deduplicated.mmi.fa .. [2022-06-13T15:45:11Z DEBUG coverm::mapping_index_maintenance] Running DB indexing command: "bwa" "index" "-p" "/home/fb343/scratch60/coverm-mapping-index.xJlBNZi30ad2/representatives_deduplicated.mmi.fa" "/home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/representatives_deduplicated.mmi.fa"

TMPDIR has been set to /home/fb343/scratch60/

wwood commented 2 years ago

Hi @fbeghini I would be surprised if TMPDIR being set would change the behaviour. Can you let me know the command line you are using and an ls of the directory containing the bwa index? Since two people have reported this now I imagine there is something untoward going on, but I cannot reproduce as shown in the comment just above.

Thanks, ben

fbeghini commented 2 years ago

Hi @wwood , The command line I'm using is

TMPDIR=/home/fb343/scratch60 coverm genome -t 20 -1 /home/fb343/project/HMBP/reads/021-0001-05/021-0001-05.bigtrim_1.fq.gz -2 /home/fb343/project/HMBP/reads/021-0001-05/021-0001-05.bigtrim_2.fq.gz -p bwa-mem -r /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.fa --genome-definition /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/contig_to_bin_map --no-zeros --min-read-aligned-percent 70 --min-read-percent-identity 90 --min-covered-fraction 10 --bam-file-cache-directory /home/fb343/christakis_loan/coverm/021-0001-05/ -o /home/fb343/christakis_loan/coverm/021-0001-05/021-0001-05.relab --verbose

The folder has just the files produced by bwa $ ls /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/ /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.amb /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.bwt /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.pac /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.ann /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.fa /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.sa

wwood commented 2 years ago

Hi,

In the call to coverm you specified the reference as repded.fa but the root of the index is just repded. In this case the reference should be specified to coverm as the root i.e. /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded.

I guess you used bwa index with -p as something other than default?

Did I get that right? Thanks, ben

fbeghini commented 2 years ago

I tried first using just the root of the index but coverm failed to run giving thread 'main' panicked at 'The reference specified '/home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded' does not appear to exist', src/mapping_parameters.rs:159:9

I found this post while looking if someone else was having the same issue and tried to run it by adding the file extension wondering if coverm would try to build the index after checking if all the index files are present.

bwa index was run with -b 10000000000 -p repded

wwood commented 2 years ago

Ah, right. That is a slight bug then. To work around that, you can just create an empty file i.e. touch /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded and then specify -r /home/fb343/christakis_loan/vamb/representatives_deduplicated_fnas/repded to coverm.

I didn't quite understand that last part. Was there some reason that you needed to use -p with bwa index?

fbeghini commented 2 years ago

The empty file trick works, thanks. Also, no reason to use -p, wanted to make sure that was not a problem of the file name when I was creating the index.

wwood commented 2 years ago

Good good. Closing this now since I guess the underlying issue has been solved for the next release.

jianshu93 commented 2 years ago

Hello Ben,

Can you please update the crates.io page for CoverM first?

Thanks,

Jianshu

wwood commented 2 years ago

Hi Jianshu,

I consider issues as closed when they are fixed in the main branch, which will make it into the next release (here, the one after 0.6.1).

Having said that, a new release is quite overdue - hope to get to it soon.

jianshu93 commented 2 years ago

Thanks ben, I can compile it now but for those who have less knowledge about compiling, they may need a new binary or something.

Jianshu