wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
311 stars 31 forks source link

Usage of --sharded #172

Open Rridley7 opened 1 year ago

Rridley7 commented 1 year ago

Hi, thanks again for the development of this tool. I had a quick question in regards to the use of the --sharded command. I believe my reference fits the use case, where a single index would not fit into memory. My expectation was that if I pass multiple references to the -r option, the reads would be mapped to both references, then the best match between the two would be selected. So my current command is:

REF_FILE1=02_map_binned/side_test/bwa_idx/part1
REF_FILE2=02_map_binned/side_test/bwa_idx/part2
BAM_DIR=02_bam_files

TMPDIR=tmp_dir coverm contig -m mean -r $REF_FILE1 $REF_FILE2  \
--output-file test_shard.tsv  -p bwa-mem2 --sharded \
--min-read-percent-identity 0.95 --min-read-aligned-percent 0.95 -t 24 \
--bam-file-cache-directory $BAM_DIR  --no-zeros --single unbinned_nr_genes_00[345].ffn.gz

For which, I get the output:

[2023-06-18T04:20:00Z INFO  bird_tool_utils::clap_utils] CoverM version 0.6.1
[2023-06-18T04:20:00Z INFO  coverm] Using min-read-percent-identity 95%
[2023-06-18T04:20:00Z INFO  coverm] Using min-read-aligned-percent 95%
[2023-06-18T04:20:00Z INFO  coverm] Writing output to file: test_shard.tsv
[2023-06-18T04:20:00Z INFO  coverm] Using min-covered-fraction 0%
[2023-06-18T04:20:01Z INFO  bird_tool_utils::external_command_checker] Found bwa-mem2 version 2.2.1
[2023-06-18T04:20:01Z INFO  bird_tool_utils::external_command_checker] Found samtools version 1.16.1
[2023-06-18T04:20:01Z INFO  coverm] Writing BAM files to already existing directory 02_bam_files
[2023-06-18T04:20:01Z INFO  coverm::mapping_index_maintenance] BWA index appears to be complete, so going ahead and using it.
[2023-06-18T04:20:01Z INFO  coverm] Caching BAM file to 02_bam_files/part1.unbinned_nr_genes_003.ffn.gz.bam
[2023-06-18T04:20:01Z INFO  coverm] Caching BAM file to 02_bam_files/part1.unbinned_nr_genes_004.ffn.gz.bam
[2023-06-18T04:20:01Z INFO  coverm] Caching BAM file to 02_bam_files/part1.unbinned_nr_genes_005.ffn.gz.bam
[2023-06-18T04:20:01Z INFO  coverm::mapping_index_maintenance] BWA index appears to be complete, so going ahead and using it.
[2023-06-18T04:20:01Z INFO  coverm] Caching BAM file to 02_bam_files/part2.unbinned_nr_genes_003.ffn.gz.bam
[2023-06-18T04:20:01Z INFO  coverm] Caching BAM file to 02_bam_files/part2.unbinned_nr_genes_004.ffn.gz.bam
[2023-06-18T04:20:01Z INFO  coverm] Caching BAM file to 02_bam_files/part2.unbinned_nr_genes_005.ffn.gz.bam
[2023-06-18T04:20:30Z INFO  coverm::contig] In sample 'part1/unbinned_nr_genes_003.ffn.gz', found 4666 reads mapped out of 567290 total (0.82%)
[2023-06-18T04:20:57Z INFO  coverm::contig] In sample 'part1/unbinned_nr_genes_004.ffn.gz', found 4967 reads mapped out of 567387 total (0.88%)
[2023-06-18T04:21:26Z INFO  coverm::contig] In sample 'part1/unbinned_nr_genes_005.ffn.gz', found 4801 reads mapped out of 567706 total (0.85%)
[2023-06-18T04:21:54Z ERROR coverm::coverage_takers] Found a difference amongst the reference sets used for mapping. For this (non-streaming) usage of CoverM, all BAM files must have the same set of reference sequences. Previous entry was contig-196890-spa-t-S25_9i6072_2, new is contig-27673-spa-t-S57_2b3234_5

What is the proper usage of coverm in this case, with a 2 part index?