When converting my own CGI-Var file (from my Harvard PGP page), there are lines that I would have thought should be a simple "nocall" line but have a long string of reference (under the "REF" column) but still have a 'NOCALL' as a filter.
Here is a small snippet (the test0.txt file) that produces the line above (be careful of whitespace vs. tab separation if you cut and paste)
#APPROVAL Records of report approval are on file with Complete Genomics, Inc.
#TITLE Whole Human Genome Sequencing
#ADDRESS This report was prepared by Complete Genomics Inc. at 2071 Stierlin Ct., Mountain View, CA 94043
#CUSTOMER_SAMPLE_ID hu826751
#SAMPLE_SOURCE Other
#REPORTED_GENDER MALE
#CALLED_GENDER MALE
#TUMOR_STATUS no
#LIBRARY_TYPE Pure LFR
#LIBRARY_SOURCE Version 2
#ASSEMBLY_ID GS000037338-ASM
#COSMIC COSMIC v65
#DBSNP_BUILD dbSNP build 137
#GENOME_REFERENCE NCBI build 37
#SAMPLE GS03052-DNA_B01
#GENERATED_BY cgatools
#GENERATED_AT 2014-Jul-01 04:55:14.521195
#SOFTWARE_VERSION 2.5.0.33
#FORMAT_VERSION 2.5
#GENERATED_BY dbsnptool
#TYPE VAR-ANNOTATION
>locus ploidy allele chromosome begin end varType reference alleleSeq varScoreVAF varScoreEAF varFilter hapLink xRef alleleFreq alternativeCalls
21576 2 all chr1 997408 997432 ref = =
21577 2 all chr1 997432 997433 no-call = ?
21578 2 all chr1 997433 997442 ref = =
21579 2 1 chr1 997442 997517 no-call CCTTGTCCCCGTTCCCTCCGTCCCTCTCCCCCTTCCTTCCCTCCCTCCCTCACCACCATTCCCTCCCTCCCACAT ? 6427
21579 2 2 chr1 997442 997453 sub CCTTGTCCCCG TCCCCCTTCC 21 21 AMBIGUOUS;VQLOW 6428 TCCCCCTTCT:-10;TCCCCCTTCG:-10;TCCCCCTTTC:-10;TCCCCCTTGC:-10;TCCCCTTTCC:-11;TCCCCGTTCC:-12;TCCCTCTTCC:-13;TCCCGCTTCC:-13;TCCTCCTTCC:-13;TCCGCCTTCC:-13;TTCCCCTTCC:-16;TGCCCCTTCC:-16;TCTCCCTTCC:-16;TCGCCCTTCC:-16
21579 2 2 chr1 997453 997455 ref TT TT 21 21 VQLOW 6428
21579 2 2 chr1 997455 997517 no-call CCCTCCGTCCCTCTCCCCCTTCCTTCCCTCCCTCCCTCACCACCATTCCCTCCCTCCCACAT ? 6428
21580 2 all chr1 997517 997527 ref = =
21581 2 all chr1 997527 997598 no-call = ?
21582 2 all chr1 997598 997633 ref = =
This produces:
##fileformat=VCFv4.1
##fileDate=201617
##source=cgivar2gvcf-version-0.1.5
##description="Produced from a Complete Genomics var file using cgivar2gvcf. Not intended for clinical use."
##reference=hg19.2bit
##FILTER=<ID=NOCALL,Description="Some or all of this record had no sequence call by Complete Genomics">
##FILTER=<ID=VQLOW,Description="Some or all of this sequence call marked as low variant quality by Complete Genomics">
##FILTER=<ID=AMBIGUOUS,Description="Some or all of this sequence call marked as ambiguous by Complete Genomics">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr1 997409 . C . . PASS END=997432 GT 0/0
chr1 997433 . T . . NOCALL END=997433 GT ./.
chr1 997434 . C . . PASS END=997442 GT 0/0
chr1 997443 . CCTTGTCCCCGTTCCCTCCGTCCCTCTCCCCCTTCCTTCCCTCCCTCCCTCACCACCATTCCCTCCCTCCCACAT . NOCALL . GT ./.
chr1 997518 . C . . PASS END=997527 GT 0/0
chr1 997528 . C . . NOCALL END=997598 GT ./.
chr1 997599 . G . . PASS END=997633 GT 0/0
When converting my own CGI-Var file (from my Harvard PGP page), there are lines that I would have thought should be a simple "nocall" line but have a long string of reference (under the "REF" column) but still have a 'NOCALL' as a filter.
For example, the following shows up:
I ran the following:
Here is a small snippet (the
test0.txt
file) that produces the line above (be careful of whitespace vs. tab separation if you cut and paste)This produces:
Here is the file: test0.txt