Closed skambha6 closed 3 years ago
Hi @skambha6 , Is you vcf file already been phased? We require the file column to be like:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA19240
chr1 11002 . A C 815.155 . . GT:GQ:DP:AF 0/1:1025:31:0.5484
chr1 11035 . G A 725.874 . . GT:GQ:DP:AF 0/1:930:31:0.5161
chr1 11113 . T C 806.659 . . GT:GQ:DP:AF 0|1:1021:31:0.4194
the file must be 10 columns and important columns are: 1st: chromosome name 2nd: position of the variant 4th: reference base 5th: Alternative/mutation base 10th: The sample column. If it only includes the zygosity and phase information will be enough. Phase information must be indicated by "|" (e.g. 0|1).
So, if your vcf is not in this format you can bring it to such format manually as long as the required columns are present. If your file does not have a header is also fine. For example this is also fine:
chr1 10097 . T C . . . . 0/1
chr1 10197 . T C . . . . 0|1
chr1 10291 . C T . . . . 0/1
chr1 10391 . G T . . . . 0|1
chr1 10591 . A T . . . . 1|0
Could you send me some lines from your vcf file?
Thanks, Vahid
Hi Vahid,
Here are the first few lines of my VCF file:
chr1 3000444 . T A . . . chr1:3000444
chr1 3000608 . T G . . . chr1:3000608
chr1 3000748 . T TT . . INDEL chr1:3000750
chr1 3000748 . T TT . . INDEL chr1:3000751
chr1 3000748 . T TT . . INDEL chr1:3000752
This VCF file was generated by running MUMMER on two completely homozygous genomes (collaborative cross mice genomes) and converting the resulting .delta file to a VCF file. My sample is a cross between these two homozygous parent mice. Since I know my haplotypes a priori (because the parents are entirely homozygous), would it be sufficient to manually add in a 0|1 for each variant in the sample column?
Thank you for your help!
Best, Sandeep
Yes you can manually add 1|0 or 0|1 as 10th column to your file. For example, assign paternal SNVs as 0|1 and Maternal as 1|0. Using this file, NanoMethPhase will give you HP1 (Maternal) and HP2 (Paternal).
Ok, great. Thank you!
Hi,
I have a VCF file that doesn't have the optional 'Format' and 'Sample' columns, so a result of running nanomethphase phase I get the following error: NanoMethPhase selected output format(s): bam Traceback (most recent call last): File "/home-4/skambha6@jhu.edu/.local/bin/nanomethphase", line 10, in
sys.exit(main())
File "/home-4/skambha6@jhu.edu/.local/lib/python3.8/site-packages/nanomethphase/main.py", line 1996, in main
args.func(args)
File "/home-4/skambha6@jhu.edu/.local/lib/python3.8/site-packages/nanomethphase/main.py", line 709, in main_phase
vcf_dict = vcf2dict_phase(vcf_file,args.window)
File "/home-4/skambha6@jhu.edu/.local/lib/python3.8/site-packages/nanomethphase/main.py", line 549, in vcf2dict_phase
if line_list[9].startswith('1|0') or line_list[9].startswith('0|1'):
IndexError: list index out of range
Is it possible to adjust the way vcf2dict_phase reads in the vcf file to not rely on the 'Format' or 'Sample' columns without hindering any of the downstream phasing?
Thank you! Sandeep