kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
88 stars 10 forks source link

Parameters for R9 Guppy2, 4, 6 #85

Closed StellariaYuki closed 3 months ago

StellariaYuki commented 4 months ago

Hi DYSGU developers, I am running DYSGU calling for SV in human genome using the nanopore data. Currently I have 3 fastq file from 3 different volunteers. These 3 sets of data is sequencing using R9 nanopore. Because of the time of sampling date, these 3 fastq files were basecalling using Guppy2, Guppy4 (HAC), and Guppy6 (HAC) respectively. I noticed that in your documentation you mentioned that "If you are using reads with higher error rates, or are unsure of the accuracy, it is recommended to set '--divergence auto' ". I am confused what parameters I should use for these 3 sets of data, since Guppy6 (HAC) has considerable accuracy.

kcleal commented 4 months ago

Hi @StellariaYuki, Currently the default parameters are set to work well with low error rate ONT reads such as those generated using the newer Kit14 work flow HAC or SUP models. The divergence of older kits should be set higher on account of the higher error rate. Using auto will generally set a conservative value. Dropping the divergence can improve sensitivity around multiallelic sites, although the number of duplicate true positives will increase. Did you try the auto setting?

StellariaYuki commented 4 months ago

Thank you very much for your prompt response @kcleal. Based on my understanding, I believe the appropriate setting for R9Guppy2,4,6 data, including Guppy6 data in HAC mode, should be "--max-cov auto --divergence auto". Could you kindly confirm if my understanding is correct?

kcleal commented 4 months ago

Yes, they should work well. But pay attention to the output messages printed to the terminal by dysgu, they will tell you what the inferred values were, and you can compare to the default values

StellariaYuki commented 3 months ago

@kcleal Thanks a lot, have a nice day!