ythuang0522 / homopolish

High-quality Nanopore-only genome polisher
GNU General Public License v3.0
68 stars 12 forks source link

R10.4.pkl ? #43

Open Calvinblue opened 2 years ago

Calvinblue commented 2 years ago

Dear @ythuang0522,

Thank you for this great tool, it works great on my 10.3 nanopore data ! I am soon going to switch to nanopore 10.4 chemistry and would like to know if you planned to release a 10.4.pkl model file anytime soon or if 10.3 would still be usable to work on 10.4 nanopore data?

Thank you !

ythuang0522 commented 2 years ago

Hi @Calvinblue, we don't have R10.4 data at hand yet though it's on the roadmap. We have seen the R10.3.pkl improved the Q20+ data on R9.4 but are not sure of R10.4. Would be great if you could feedback if it's necessary to retrain a model.

Calvinblue commented 2 years ago

Hi @ythuang0522, thanks for your answer ! I will keep you informed of my tests on 10.4 data

afvrbanac commented 2 years ago

@Calvinblue how did your tests go? I also have R10.4 data.

@ythuang0522 are there plans to release a R10.4 model sometime in the near future? I can just use the R10.3 model for now, but wanted to know if I should check back soon.

Calvinblue commented 2 years ago

Hi @afvrbanac, sadly the project is on standby on our side, I could not test 10.4... But I am still interested if you have some results in the future, and will provide mine as soon as I have them.

ythuang0522 commented 2 years ago

@afvrbanac We didn't have R10.4 at hand but just found one in NCBI SRA. We will check if it's worth using a new model specific to R10.4.

laurentijn commented 1 year ago

Dear @ythuang0522

Is there any update on this topic?

ythuang0522 commented 1 year ago

Hi @laurentijn

By using the R10.4.1 dataset (simplex, dorado v0.1.1, 4khz), you can still gain improvement using the old R10.3 model. However, we didn't see any improvement on the duplex mode as the duplex quality is high enough. If your sequencing data is simplex mode, you can still run homopolish and/or modpolish to erase the indel/mismatch errors remained.

image

PS: We will be training a new model for this R10.4.1 simplex dataset, though we feel R10.3 should also work as well.