yjx1217 / simuG

simuG: a general-purpose genome simulator
MIT License
86 stars 12 forks source link

Uninitialize value when using gene gff option #9

Closed Glenn032787 closed 2 years ago

Glenn032787 commented 2 years ago

Hi! This is a great program! I am just having some error using the gene-gff option. I am hoping to creates SNPs in hg38 genome. However, Ensembl 100 "Homo_sapiens.GRCh38.100.chr.gff3.gz", it keeps returning the following error.

Use of uninitialized value $primary_mRNA_id in hash element at scripts/simuG/simuG.pl line 991.

I made sure that the chromosome naming format matches the one for Ensembl GFF but it still results in this error. Do you have any suggestions in resolving this error?

Thank you!

yjx1217 commented 2 years ago

Hi Glenn032787,

Thanks for trying out simuG and reporting the issue!

Regarding the issue that you have encountered, it sounds very much like this one: https://github.com/yjx1217/simuG/issues/6

So maybe you can have a test with the latest git commited simuG retrieved from github to see if it helps.

In case the problem persists, please send me the command that you used when encountering the error as well as the download url of your input genome and gff files so that I can debug the issue on my side.

Thanks Jia-Xing

Glenn032787 commented 2 years ago

Hi Jia-Xing,

Thanks for the reply!

I have tried it using the latest git commit and it still results in the same error. The following command was used:

perl scripts/simuG/simuG.pl -refseq ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa -snp_count 400 -prefix test -gene_gff Homo_sapiens.GRCh38.100.chr.gff3.gz -coding_partition_for_snp_simulation coding

The Homo_sapiens.GRCh38.dna.primary_assembly.fa is obtained here and the Homo_sapiens.GRCh38.100.chr.gff3.gz file is obtained here

Thanks again!

yjx1217 commented 2 years ago

Hi Glenn032787,

Thanks for providing the testing example.
The problem was triggered by gene record such as ENSG00000188403, which lacks an associated mRNA annotation record.

15      havana  gene    19964666        19965101        .       -       .       ID=gene:ENSG00000188403;Name=IGHV1OR15-9;biotype=IG_V_gene;description=immunoglobulin heavy variable 1/OR15-9 (non-functional) [Source:HGNC Symbol%3BAcc:HGNC:5569];gene_id=ENSG00000188403;logic_name=havana_ig_gene_homo_sapiens;version=7
15      havana  V_gene_segment  19964666        19965101        .       -       .       ID=transcript:ENST00000338912;Parent=gene:ENSG00000188403;Name=IGHV1OR15-9-201;biotype=IG_V_gene;tag=basic;transcript_id=ENST00000338912;transcript_support_level=NA;version=5
15      havana  exon    19964666        19964972        .       -       .       Parent=transcript:ENST00000338912;Name=ENSE00003690955;constitutive=1;ensembl_end_phase=2;ensembl_phase=1;exon_id=ENSE00003690955;rank=2;version=1
15      havana  CDS     19964666        19964972        .       -       2       ID=CDS:ENSP00000474639;Parent=transcript:ENST00000338912;protein_id=ENSP00000474639
15      havana  exon    19965056        19965101        .       -       .       Parent=transcript:ENST00000338912;Name=ENSE00002984675;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=ENSE00002984675;rank=1;version=2
15      havana  CDS     19965056        19965101        .       -       0       ID=CDS:ENSP00000474639;Parent=transcript:ENST00000338912;protein_id=ENSP00000474639
###

So I've now applied a small fix to simuG that enables it skipping such unexpected gene record during simulation.

See here for the details. https://github.com/yjx1217/simuG/commit/16e4eeab285e0574c1ad7ce50c5bb34274346250

So please have a try with this newly committed version. It works with your testing example on my side now.

Thanks again for spotting this issue, which helps simuG to cover corner cases such as this one.

Best, Jia-Xing

Glenn032787 commented 2 years ago

Yup I got it to work on my end. Thanks so much!