yjx1217 / simuG

simuG: a general-purpose genome simulator
MIT License
83 stars 11 forks source link

Use of uninitialized value in substr at [...] line 1158 #3

Closed Viloleal closed 4 years ago

Viloleal commented 4 years ago

Hi! I am trying out this package and so far it has been working out smoothly when generating random SNPs. However, when I want to generate SNPs using vcf file I get an error:

simug -refseq ../NC_002945v4.fasta -snp_vcf ABCD_modified.vcf -prefix ABCD

[simug is the alias I use to run the script from my path]

[Tue Jun 16 15:34:24 2020] Starting simuG ..

[Tue Jun 16 15:34:24 2020] Check specified options .. Running simuG for SNP/INDEL simulation >> Ignore all options for CNV/inversion/translocation simulation.

This simulation use the random seed: 1128527723

The option snp_vcf has been specified: snp_vcf = ABCD_modified.vcf Ignore incompatible option: snp_count Ignore incompatible option: snp_model Ignore incompatible option: titv_ratio

[Tue Jun 16 15:34:25 2020] Parsing the input vcf file: ABCD_modified.vcf

!!! Warning! Multiple alternative variants found at the same site: !!! NC_002945.4:3884276 GGGCCGGGGGCGCCGGCGA=>G,GGGCCGGGGGCGCCGGCGG QUAL=0! !!! Ignore all variants at this site.

[Tue Jun 16 15:34:25 2020] Introducing defined SNP/INDELs based on the input vcf file(s):

snp_vcf = ABCD_modified.vcf Use of uninitialized value in substr at /home/victor/simuG/simuG.pl line 1158.

I have also tried pasting the SNPs in the SNP.vcf file in the testing directory, but still get the same error. Even when I delete the duplicate variant, I still get the same error:

snp_vcf = ABCD_modified.vcf Use of uninitialized value in substr at /home/victor/simuG/simuG.pl line 1158.

What could it be? I am using an UBUNTU 18.04.

Thanks!

yjx1217 commented 4 years ago

Hi Viloleal,

Thanks for trying out simuG!

My guess will be there is a mismatch between the vcf file and your input reference genome fasta file. You could double check two things: 1) the version of the reference genome used when your input vcf file was generated originally. 2) the naming convention of the chromosome/scaffold/contig id in your fasta file and vcf files. Particularly, if there are extra strings after the chromosome/scaffold/contig id in the ">" line of the fasta file, it might cause problem.

You can also send me your input files for me to test if needed.

Best, Jia-Xing

Viloleal commented 4 years ago

Hi Jia-Xing,

Thank you very much for answering so rapidly. The version of the reference genome is the same for both packages, so I tried the second option.

The reference genome did have quite a bit of extra strings on the fasta header, so I reduced it as you recommended. I reran the variant_calling pipeline and simuG and it worked smoothly! I will let you know if I find another issue.

Thanks!

Kind regards,

yjx1217 commented 4 years ago

Hi Viloleal,

Glad that the issue has been solved! I've also made a minor change to simuG (commit 57aaa82) to make it a little bit more robust in dealing with this situation and you are welcome to test it. :-) Thanks again for using simuG!

Best, Jia-Xing