yfukasawa / LongQC

LongQC is a tool for the data quality control of the PacBio and ONT long reads.
MIT License
143 stars 16 forks source link

Questions about Index Size and Short Mode #50

Open adlpecer opened 2 years ago

adlpecer commented 2 years ago

Hi @yfukasawa, In the first place, thank you for developing LongQC. I am currently testing the tool to understand all the parameters better and choose their optimal configuration. However, I have several questions about the Index Size and the Short Mode since my test results seem unclear. I have used two public datasets for my tests: flnc.bam (PacBio, Transcriptomic, ~4 Gb) and pb.bam (Pacbio, Genomic, ~12 Gb). These are the results of my tests:

Test 1 - flnc.bam

Command (Only modifying the index size on each iteration):

longQC.py sampleqc -o /tmp/results -x pb-hifi -n 10000 -p 8 -m 2 -i 1G -t /data/input/flnc.bam

Results

Metrics table

image

CPUs and Memory use over time

Index Size = 1G

1G

Index Size = 8G

8G

Test 2 - pb.bam

Command (This time I have modified both index size and short mode):

longQC.py sampleqc -o /tmp/results -x pb-sequel -n 10000 -p 8 -m 2 -i 1G -b /data/input/pb.bam

Results

Metrics table

image

CPUs and Memory use over time

Index Size = 1G

1G

Index Size = 8G

8G

Conclusions

According to the results, my questions are the next:

Thanks!, Adolfo