Closed ipstone closed 3 years ago
Hi @ipstone, Thanks for reaching out. If you look more closely at the excerpt of your VCF file above, it contains 9 columns and not 8. A quick suggestion is that you simply remove the 9th (FORMAT) column, since you do not have any sample (genotype) data present (this should occur in column 10 if you have a FORMAT column, and I suspect that is the reason it fails).
Try to see if that might give you success.
best, Sigve
Thanks @sigven for picking out the format issue, it helps.
After removing the extra column, I noticed some other issue/s, as my input vcf file was originally a phased germline variants file, so some lines have multiple alleles info. With the vcf_validate option kept on, it would run into warning/error of
ERROR: Line 158011: INFO SVTYPE must be one of: BND, CNV, DEL, DUP, INS, INV. Found SVTYPE was 'MEI'.
ERROR: Line 163444: INFO SVTYPE must be one of: BND, CNV, DEL, DUP, INS, INV. Found SVTYPE was 'MEI'.
ERROR: Line 164819: INFO SVTYPE must be one of: BND, CNV, DEL, DUP, INS, INV. Found SVTYPE was 'MEI'.
ERROR: Line 166354: INFO SVTYPE must be one of: BND, CNV, DEL, DUP, INS, INV. Found SVTYPE was 'MEI'.
If given --no_vcf_validate
option, gvanno ran all the way, but gave an error when exporting the tsv file (I think):
2020-09-15 20:51:07 - gvanno-gene-annotate - INFO - Completed summary of functional annotations for 9345 variants on chromosome 16
2020-09-15 20:51:17 - gvanno-gene-annotate - INFO - Completed summary of functional annotations for 11197 variants on chromosome 17
2020-09-15 20:51:18 - gvanno-gene-annotate - INFO - Completed summary of functional annotations for 4889 variants on chromosome 18
Traceback (most recent call last):
File "/gvanno/gvanno_summarise.py", line 125, in <module>
if __name__=="__main__": __main__()
File "/gvanno/gvanno_summarise.py", line 23, in __main__
extend_vcf_annotations(args.vcf_file, args.gvanno_db_dir, args.lof_prediction)
File "/gvanno/gvanno_summarise.py", line 91, in extend_vcf_annotations
csq_record_results = annoutils.parse_vep_csq(rec, gvanno_xref, vep_csq_fields_map, logger, pick_only = True, csq_identifier = 'CSQ')
File "/gvanno/lib/annoutils.py", line 705, in parse_vep_csq
assign_cds_exon_intron_annotations(csq_record)
File "/gvanno/lib/annoutils.py", line 454, in assign_cds_exon_intron_annotations
exon_pos_info = csq_record['NearestExonJB'].split("+")
AttributeError: 'NoneType' object has no attribute 'split'
might it be an issue coming from :
gvanno-validate-input - WARNING - Multiallelic site detected:8 48317851 CGTGTGTGT CGTGTGT,CG
TGT,CGT,CGTGTGTGTGT,C,CGTGTGTGTGTGT
I have parsed the mutiple allelic form of VCF to one varianet per line, will test again it that helps.
Thanks again , I think it's making progress. There is already the '...gvanno_ready.vep.vcfanno.annotated.vcf' output file, I guess I could manually parse out the annotation if I still have trouble with the last step in python.
Using the one variant per line (formatted input vcf), still run to similar error at line 125:
2020-09-15 21:51:31 - gvanno-gene-annotate - INFO - Completed summary of functional annotations for 11197 variants on chr[10/2828]
7
2020-09-15 21:51:33 - gvanno-gene-annotate - INFO - Completed summary of functional annotations for 4889 variants on chromosome 18
Traceback (most recent call last):
File "/gvanno/gvanno_summarise.py", line 125, in <module>
if __name__=="__main__": __main__()
File "/gvanno/gvanno_summarise.py", line 23, in __main__
extend_vcf_annotations(args.vcf_file, args.gvanno_db_dir, args.lof_prediction)
File "/gvanno/gvanno_summarise.py", line 91, in extend_vcf_annotations
csq_record_results = annoutils.parse_vep_csq(rec, gvanno_xref, vep_csq_fields_map, logger, pick_only = True, csq_identifier =
'CSQ')
File "/gvanno/lib/annoutils.py", line 705, in parse_vep_csq
assign_cds_exon_intron_annotations(csq_record)
File "/gvanno/lib/annoutils.py", line 454, in assign_cds_exon_intron_annotations
exon_pos_info = csq_record['NearestExonJB'].split("+")
AttributeError: 'NoneType' object has no attribute 'split'
This line linked to annoutils.py line 454 : https://github.com/sigven/gvanno/blob/7c6affd7ddc1badeb1e20a415af5729faacbfd0a/src/gvanno/lib/annoutils.py#L454 might it be a csq_record NoneType error? - perhaps adding a None check here?
Looks like you have hit the target there. If you are able to share your VCF with me (or some parts of it) along with your assembly and configuration file, I can make a more robust check when looking for Exon Junction information (i.e. check for None
), and tests that it works for your case.
thanks, Sigve
Fixed
Hello,
Thank you for your work on gvanno! I have gotten gvanno working on my centOS 7 box, and the example annotation ran well. However, when I am trying to annotate a simple vcf file like the following, I ran into the error (probably on every line of variant):
The VCF file looks like the following:
What might be the cause of the error? Is there a way to format the vcf to get it properly annotated?
Thanks in advance!
-- ipstone