Closed rlancaster96 closed 5 months ago
Did you end up using a GATK tool to convert the coordinates instead of the UCSC's liftOver tool? Seems the latter has some known issues when there are multiple patches between major genome updates.
Did you end up using a GATK tool to convert the coordinates instead of the UCSC's liftOver tool? Seems the latter has some known issues when there are multiple patches between major genome updates.
yes! that makes sense. I used gatk's version since I couldn't figure out how to get UCSC liftOver to work with VCFs, and I was having trouble converting VCF to BED. I can try again though. I also saw this note from UCSC Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry.
Did you end up using a GATK tool to convert the coordinates instead of the UCSC's liftOver tool? Seems the latter has some known issues when there are multiple patches between major genome updates.
yes! that makes sense. I used gatk's version since I couldn't figure out how to get UCSC liftOver to work with VCFs, and I was having trouble converting VCF to BED. I can try again though. I also saw this note from UCSC Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry.
GATK's version sounds like the right choice. Will check with KE on the necessity of specifically liftingOver and converting VCF to bed this way and report back.
Did you end up using a GATK tool to convert the coordinates instead of the UCSC's liftOver tool? Seems the latter has some known issues when there are multiple patches between major genome updates.
yes! that makes sense. I used gatk's version since I couldn't figure out how to get UCSC liftOver to work with VCFs, and I was having trouble converting VCF to BED. I can try again though. I also saw this note from UCSC Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry.
GATK's version sounds like the right choice. Will check with KE on the necessity of specifically liftingOver and converting VCF to bed this way and report back.
It seems we'll have the same SNP-related issue that UCSC explicitly references, regardless of which program is used. @rlancaster96 can you try the UCSC liftOver tool again and see if you can get it to work with VCFs? There are also a few other tools mentioned in this recent paper that could be alternate options: https://www.mdpi.com/2073-4425/14/10/1875 (@kellrott if you have thoughts for/against other tools like those in the paper, please share)
@elisabethgoldman I tried again with UCSC liftOver, and unfortunately it does not take VCFs
> ./liftOver chr19.truth.vcf hg19ToHg38.over.chain.gz chr19_liftoverhg38.truth.vcf unMapped
Reading liftover chains
Mapping coordinates
Expecting integer field 3 line 106 of /home/groups/CEDAR/lancasru/WGS_COH_NF/make_test_truth/synapse_files/chr19.truth.vcf, got .
I can look into tools to convert VCF to BED while preserving all of the information in the VCF, and I'll also look into the tools used in the paper!
I successfully created the lifted over VCFs using Ensembl's CrossMap (at least, it produced files with entries as expected). Not sure how to test how accurate it is, or if it works for our needs. chr19 - 2 unmapped after liftover (1738/1740 variants mapped) chr20 - 0 unmapped after liftover (1528/1528 variants mapped)
If needed I could still look into converting the VCF files to BED format to use UCSC's liftover, and for that I would have to either: 1) explore bedops bed to vcf as an option or 2) manually convert vcf to bed using a python script, since we have indels that require some end position adjustment. Those are both totally options and if still needed I can look into!
Liftover fails for truth VCF file for only chromosome 19 and 20 from test files available on Synapse (https://www.synapse.org/#!Synapse:syn2335184)
Command:
Expected behavior: Liftover variant information from hg19 reference to hg38 reference.
Actual behavior: No variants lifted over.
Issue posted online? A similar issue was posted on a gatk forum. The solution to use the GRch37 to 38 chain produced the following:
Environment The Genome Analysis Toolkit (GATK) v4.3.0.0 HTSJDK Version: 3.0.1 Picard Version: 2.27.5
* Edit: I went back and realized that liftover failed for chr20 as well, so the failure isn't because of a difference in files.