Open leoisl opened 2 years ago
We might be able to keep -I 12G
and still use less RAM. The trick would be to use this minimap2
param:
--idx-no-seq
Don't store target sequences in the index. It saves disk
space and memory but the index generated with this option
will not work with -a or -c. When base-level alignment is
not requested, this option is automatically applied.
... although when we map reads to the decontamination minimap2
index, we do require base-level mapping (i.e. we run with flags -aL
). But looking downstream I think we don't need these flags and can parse a PAF
file. It all depends on whether we indeed need to decrease RAM or not. @FlorianePoint could you please tells us if you have observed any RAM issue when running tbpore
either on your site or in Madagascar? Thanks!
Hi Leandro, Yes we (Nanah in Mada and I) already had RAM issue when using tbpore with a minimap2 return code -9. It happened when we had less than 13G free. Floriane
But looking downstream I think we don't need these flags and can parse a PAF file.
Correct. I used to extract the reads from the SAM, but have have since switch to using seqkit grep to get the read ids from fastqs. So PAF will be fine I think.
The current
minimap2
index was built with-I 12G
to match theH2H
index. This pushes thetbpore
RAM usage when runningtbpore process
to13.1
GB. We could instead build the index with-I 500M
, which would take thetbpore process
RAM down to ~5GB, which is much more runnable in a personal laptop, but then the results are not identical to theH2H
results. We should evaluate the impact of this different index on the clustering and on thetbpore
results in general, and infer if is indeed OK to switch to this lighter index. This might be related to https://github.com/mbhall88/tbpore/issues/22