larryns / MitoScape

A big-data, machine-learning workflow for aligning mtDNA from NGS data.
Apache License 2.0
8 stars 5 forks source link

HG19-NUMTs file missing #6

Open joshuaraviebe opened 5 months ago

joshuaraviebe commented 5 months ago

How to run for hg19 WGS file?

there is only this NUMTs_hg38.txt file available. Can I try liftover for this file and try?

larryns commented 5 months ago

The hg19 file isn't missing, there isn't (and shouldn't be) one. To create a model that uses the hg19 reference would require training a completely separate model with all hg19 data. It's not worth it. But most of all hg19 shouldn't be used for mitochondria--the sequence has errors and is built from contaminated sequences. These problems were fixed in hg38, and didn't occur in GRCh37. If you really want hg19 variants, your best bet is to perform the entire pipeline in hg38, and do a liftover on the final output variants back to hg19. You can see how doing so would be a problem though given that the hg19 mitochondrial reference is wrong. I would strongly recommend not using any reference older than hg38, which is already more than a decade old, though.

joshuaraviebe commented 5 months ago

Thanks @larryns for explaining it in detail. Will try with HG38.