Closed holtjma closed 4 years ago
Could you please show me how you modified the headers of MapCaller's VCF for evaluation? Thank you!
Yep, here the python function I threw together to make a VCF file that worked (there are details that would probably need correcting in production). I should note I didn't bother making a GQ
field, but that's also something that would generally be good to have for the calls.
def createRTG_VCF(vcfFN, sampleName, outFN):
'''
This will read a MapCaller VCF and reformat for RTG. Also strips out BND calls for now.
@param vcfFN - the filename of the "vcf" made by MapCaller
@param sampleName - the sample name to use
@param outFN - the output filename for a reformatted VCF
'''
i = 0
fpi = open(vcfFN, 'r')
fpo = open(outFN, 'w+')
for l in fpi:
if i % 100000 == 0:
print('Processing line %d...' % (i, ))
i += 1
if (l.startswith('##fileformat') or
l.startswith('##reference') or
l.startswith('##source') or
l.startswith('##CommandLine') or
l.startswith('##Contig') or
l.startswith('##FILTER')):
fpo.write(l)
elif l.startswith('##'):
#print('IGNORE: '+l.rstrip())
pass
elif l.startswith('#'):
#add in all the extra headers here
fpo.write('##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n')
fpo.write('##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth">\n')
fpo.write('##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allele Depth">\n')
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SL362490
fpo.write('#'+'\t'.join(['CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO', 'FORMAT', sampleName])+'\n')
else:
pieces = l.rstrip().split('\t')
chrom = pieces[0]
pos = pieces[1]
varID = pieces[2]
ref = pieces[3]
alt = pieces[4]
qual = pieces[5]
filterVal = pieces[6]
origInfo = pieces[7]
infoDict = {}
for subStr in origInfo.split(';'):
label, value = subStr.split('=')
infoDict[label] = value
if infoDict['TYPE'] == 'BND':
continue
#now the things we fill in
infoField = '.'
formatField = 'GT:DP:AD'
newGT = infoDict.get('GT', './.').replace('|', '/')
newDP = infoDict['DP']
newAD = str(int(infoDict['DP']) - int(infoDict['AD']))+','+infoDict['AD']
sampleField = ':'.join([newGT, newDP, newAD])
fpo.write('\t'.join([chrom, pos, varID, ref, alt, qual, filterVal, infoField, formatField, sampleField])+'\n')
fpo.close()
fpi.close()
Hello, I'm currently testing MapCaller and using RTG vcfeval to evaluate it. Unfortunately, it doesn't work out-of-the-box due to using an unexpected VCF format. Here is the exact error message I'm receiving:
I suspect (by viewing the VCF) that there will be other issues as well such as the location of the GT field (it's under INFO instead of a sample) that will also cause evaluation tools like RTG to fail.