Closed gonzalpk closed 4 years ago
Greetings, are you able to share the VCF file, or a subset of it?
The error is occurring in a part of the VCF parser that removes redundant bases from the REF and ALT alleles. E.g. code that turns (ACAA --> AGAA) into (AC --> AG). My first guess would be these fields of the input VCF might be non-standard or in some format that I didn't anticipate.
Absolutely, Thanks for getting back to me so quickly. The VCF file is attached.
Patrick Gonzales Link lab, Department of Integrative Physiology University of Colorado, Boulder
From: zstephens notifications@github.com Sent: Thursday, February 20, 2020 4:49 PM To: zstephens/neat-genreads neat-genreads@noreply.github.com Cc: Patrick Kenneth Gonzales patrick.gonzales@colorado.edu; Author author@noreply.github.com Subject: Re: [zstephens/neat-genreads] VCF problem: IndexError: string index out of range_ (#68)
Greetings, are you able to share the VCF file, or a subset of it?
The error is occurring in a part of the VCF parser that removes redundant bases from the REF and ALT alleles. E.g. code that turns (ACAA --> AGAA) into (AC --> AG). My first guess would be these fields of the input VCF might be non-standard or in some format that I didn't anticipate.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/zstephens/neat-genreads/issues/68?email_source=notifications&email_token=AB3TIHN5SCHRMLJ5ES6HSPDRD4JKLA5CNFSM4KYYPTLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQ7G4Y#issuecomment-589427571, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB3TIHOUNCU2UW7SQ7F2FHLRD4JKLANCNFSM4KYYPTLA.
I'm unsure if the attachment sent correctly (I can't see it in either github or the email response), feel free to send it directly to me at zstephe2@illinois.edu.
Thanks!
Greetings! I pushed an update to the repository that should fix this. It was indeed a bug in input variant simplification code.
Excellent! The program works beautifully now. Thank you.
Hello, I am trying to create normal-tumor paired DNAseq samples. My approach is to set the rng for both normal and tumor samples to the same number to establish germline mutations and then use randomly sampled mutations from a COSMIC VCF file for the tumor sample. I am targeting genomic regions and have used bedtools to extract COSMIC mutations from those regions. When running the script the normal sample runs beautifully but the tumor sample fails with the italicized error message below. Im assuming the issue is with my VCF file but I am not sure what needs to be fixed in the VCF file. Any thoughts on the error would be greatly appreciated. Also the script is pasted below. Thank you, Patrick
_reading input VCF... Warning: Found variants without a GT field, assuming heterozygous... Traceback (most recent call last): File "/projects/gonzalpk/neat-genreads/genReads.py", line 743, in
main()
File "/projects/gonzalpk/neat-genreads/genReads.py", line 277, in main
(sampNames, inputVariants) = parseVCF(INPUTVCF,ploidy=PLOIDS)
File "/projects/gonzalpk/neat-genreads/py/vcfFunc.py", line 176, in parseVCF
while len(varsOut[r][i][1]) > 1 and all([n[-1] == varsOut[r][i][1][-1] for n in varsOut[r][i][2]]):
IndexError: string index out of range
python /projects/gonzalpk/neat-genreads/genReads.py \ -r /scratch/summit/gonzalpk/ensembl/UCSC/hg38.fa \ -R 150 \ -E 0.01 \ --bam \ -c 500 \ -v 0.vcf \ --vcf \ --rng 0 \ --gz \ --pe 300 30 \ -t targeted_panel_locations.bed \ -to 0 \ -o 0_target_simulated_data_tumor &
python /projects/gonzalpk/neat-genreads/genReads.py \ -r /scratch/summit/gonzalpk/ensembl/UCSC/hg38.fa \ -R 150 \ -E 0.01 \ --bam \ -c 500 \ --vcf \ --rng 0 \ --gz \ --pe 300 30 \ -t targeted_panel_locations.bed \ -to 0 \ -o 0_target_simulated_data_normal &