Open ManavalanG opened 3 years ago
You are right. Current Crossmap does not support SV format. We will consider adding this function in the future.
Hi! @liguowang Thanks for creating a useful tool. I found this tool in the book Bioinformatics Data Skills.
But, I also noticed that it crashes when converting VCF files.
A simplified example to make it easier to reproduce.
test.vcf
##fileformat=VCFv4.2
##FILTER=<ID=LOW_QUAL,Description="Low quality call">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT foo.bam
chr16 25925125 id2 N N[chr22:20000000[ 100.00 PASS SVTYPE=BND GT ./.
chr16 21925125 id1 N N[chr22:20000000[ 100.00 PASS SVTYPE=BND GT ./.
Convert T2T-CHM13 -> GRCh38
chain: chm13v2-grch38.chain target: GRCh38.p13.genome.fa
Command to execute
CrossMap.py vcf chm13v2-grch38.chain test.vcf GRCh38.p13.genome.fa out.vcf
Error messages.
2022-12-02 01:15:19 [INFO] Read the chain file "chm13v2-grch38.chain"
2022-12-02 01:15:21 [INFO] Filter out variants [reference_allele == alternative_allele] ...
2022-12-02 01:15:21 [INFO] Updating contig field ...
2022-12-02 01:15:21 [INFO] Lifting over ...
Traceback (most recent call last):
File "/home/kojix2/miniconda3/bin/CrossMap.py", line 319, in <module>
crossmap_vcf_file(mapping = mapTree, infile= in_file, outfile = out_file, liftoverfile = chain_file, refgenome = genome_file, noCompAllele = args.no_comp_alleles, compress = args.compression, cstyle = args.cstyle)
File "/home/kojix2/miniconda3/lib/python3.10/site-packages/cmmodule/mapvcf.py", line 183, in crossmap_vcf_file
fields[4] = revcomp_DNA(fields[4], True)
File "/home/kojix2/miniconda3/lib/python3.10/site-packages/cmmodule/utils.py", line 104, in revcomp_DNA
return ''.join([complement[base] for base in reversed(seq)])
File "/home/kojix2/miniconda3/lib/python3.10/site-packages/cmmodule/utils.py", line 104, in <listcomp>
return ''.join([complement[base] for base in reversed(seq)])
KeyError: '['
The output file shows that the former record does not require revcomp_DNA()
and is therefore output as is. The latter is shown to have failed during revcomp_DNA()
.
tail out.vcf
##contig=<ID=chrM,length=16569,assembly=GRCh38.p13.genome.fa>
##contig=<ID=chrX,length=156040895,assembly=GRCh38.p13.genome.fa>
##contig=<ID=chrY,length=57227415,assembly=GRCh38.p13.genome.fa>
##liftOverProgram=CrossMap,version=0.6.4,website=https://crossmap.readthedocs.io/en/latest/
##liftOverChainFile=chm13v2-grch38.chain
##originalFile=test.vcf
##targetRefGenome=GRCh38.p13.genome.fa
##liftOverDate=December02,2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT foo.bam
chr16 25648607 id2 A N[chr22:20000000[ 100.00 PASS SVTYPE=BND GT ./.
I know it is a real hard task to maintain such a tool continuously for years. So I am not asking you to address this issue. I have tried the new release 0.6.5 and confirmed that this issue reproduces itself. Just reporting. Thanks.
Please check the ALT field. It is not valid seeing from the example you provided: chr16 25648607 id2 T N[chr22:100000[ 100.00 PASS SVTYPE=BND GT ./.
Indeed, chm13v2 was the target and the entire command was wrong, so I have corrected it. Also, in GRCh38, chr22:1000000 is undeciphered, so I changed it to chr22:20000000 .
CrossMap can only handle these characters in the ALT field (i.e., the 5th column):
dict_keys(['A', 'C', 'G', 'T', 'Y', 'R', 'S', 'W', 'K', 'M', 'B', 'V', 'D', 'H', 'N', '.', '*'])
Yes. That's right. However, it should be possible to convert the ALT field for BNB. I was thinking about this the day before yesterday, but it seems difficult for my brain. It is especially complicated because it may also include an inserted sequence.
https://samtools.github.io/hts-specs/VCFv4.2.pdf
Because Crossmap supports INFO/END values in SV VCFs since
0.4.3
, it gives the impression that it supports lifting over of structural variants as well. However it currently doesn't convert values in ALT field forBND
calls.Here I use manta's test dataset for demo purposes:
Input SV vcf
Crossmap output
Notice that coordinates in ALT column remain the same after lifting-over.