pirovc / ganon

ganon2 classifies genomic sequences against large sets of references efficiently, with integrated download and update of databases (refseq/genbank), taxonomic profiling (ncbi/gtdb), binning and hierarchical classification, customized reporting and more
https://pirovc.github.io/ganon/
MIT License
86 stars 13 forks source link

Stalled generating report? #264

Closed schorlton closed 9 months ago

schorlton commented 10 months ago

Ganon looks really great (thanks for the detailed docs) so I wanted to test it out on a toy example. I have ~131k mix of paired-end and single end reads. Running 1.9.0 in micromamba.

Built a database with 3 genomes (human, 1 bacteria and 1 virus) to test it out. Specifically, here was my command: ganon build-custom --input-target file --input-file ganon.txt -d w_size -x ncbi -t 30 --hibf

ganon.txt looks like:

9606.fna    9606    9606
10298.fna   10298   10298
5476.fna    5476    5476

Each of these is a single reference genome from NCBI except 9606 which is CHM13v2.

Then ran classify with binning: ganon classify -t 32 -b -o test_bin_w_size -d ganon/w_size -s singles.fastq -p reads_1.fastq reads_2.fastq --verbose

It ran the classification really quickly, but then looks like it stalled on generating the report. Specifically, it's using 100% of 1 thread for >25 minutes and using >40gb RAM when I killed it. I also tried building the database with the --skip-genome-size.

Is this expected? I was surprised that the classification was so quick but the reporting took this long given the toy example. Thanks again!!

Full log:

- - - - - - - - - -
   _  _  _  _  _   
  (_|(_|| |(_)| |  
   _|   v. 1.9.0
- - - - - - - - - -
Classifying reads
----------------------------------------------------------------------
--single-reads        
                      singles.fastq
--paired-reads        
                      reads_1.fastq
                      reads_2.fastq
--output-prefix       test_bin_w_size
--output-lca          0
--output-all          1
--output-unclassified 0
--output-single       0
--hibf                1
--threads             32
--n-batches           1000
--n-reads             400
--skip-lca            1
--verbose             1
--quiet               0
----------------------------------------------------------------------
H1
--rel-filter 0
--fpr-query 1e-05
    ganon/w_size.hibf, ganon/w_size.tax --rel-cutoff 0.25
    Output files: test_bin_w_size.rep, test_bin_w_size.all
----------------------------------------------------------------------

ganon-classify    start time: Sat Oct 28 21:22:20 2023
loading filters      elapsed: 17.5995 seconds
classifying+printing elapsed: 0.131787 seconds
ganon-classify       elapsed: 17.7983 seconds
ganon-classify      end time: Sat Oct 28 21:22:38 2023

ganon-classify processed 131453 sequences (25.5247 Mbp) in 0.131787 seconds (11620.9 Mbp/m)
 - 69935 reads classified (53.2015%)
   - 69915 with unique matches (53.1863%)
   - 20 with multiple matches (0.0152146%)
 - 69955 matches (avg. 1.00029 match/read classified)
 - 61518 reads unclassified (46.7985%)
 - 374 reads skipped (too long or too short (< window size))
- - - - - - - - - -
Reassigning reads

.rep file found: test_bin_w_size.rep
test_bin_w_size.all
 - Iteration 1 (7e-06)
 - Iteration 2 (0.0)
 - 20 reassigned reads: test_bin_w_size.all
New .rep file: test_bin_w_size.rep
- - - - - - - - - -
Generating report(s)

Total valid files: 1
pirovc commented 10 months ago

Hi @schorlton, this is not expected, the report runs in a few seconds only. I managed to replicate your scenario and can confirm there's a bug in the taxonomy generation for custom databases, so I will provide a fix soon. In the current version, if you change your ganon.txt to something like:

9606.fna    9606.fna    9606
10298.fna   10298.fna   10298
5476.fna    5476.fna    5476

it will also work fine. The bug is due to a redundancy in taxonomy, where your target name (second col.) was the same as the taxonomic identifier, creating a bad .tax file.

Thanks for the detailed report!

schorlton commented 10 months ago

Thank you! Feel free to close this when ready.

pirovc commented 9 months ago

Fixed in v2.0.0 #271