yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

"ERROR: length parameters are not appropriate." On WGS quality score simulation #16

Closed jwalewski closed 4 months ago

jwalewski commented 4 months ago

Hello,

I am attempting to simulate Nanopore reads with a mean length of 92Kbp for genomic sequencing. I attempt to run this with the quality score model, but I see that apparently the length cannot be changed. I see that applies for the TS sequecing - is this the case for WGS as well?

:::: Simulation parameters :::

strategy : wgs
method : qshmm
qshmm : /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model
genome : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa
prefix : /mnt/g/Ultimate/silico_data/Nanopore_Input/Athaliana.fa6Xv1
id-prefix : S
depth : 6.000000
length-mean : 92000.000000
length-sd : 7000.000000
length-min : 100
length-max : 1000000
difference-ratio : 6:55:39
seed : 1707828641
accuracy-mean : 0.850000
pass_num : 1
hp-del-bias : 1.000000

:::: Reference stats ::::

file name : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa

ref.1 (len:30427671) : NC_003070.9 Arabidopsis thaliana chromosome 1 sequence
ref.2 (len:19698289) : NC_003071.7 Arabidopsis thaliana chromosome 2 sequence
ref.3 (len:23459830) : NC_003074.8 Arabidopsis thaliana chromosome 3 sequence
ref.4 (len:18585056) : NC_003075.7 Arabidopsis thaliana chromosome 4 sequence
ref.5 (len:26975502) : NC_003076.8 Arabidopsis thaliana chromosome 5 sequence
ref.6 (len:367808) : NC_037304.1 Arabidopsis thaliana ecotype Col-0 mitochondrion, complete genome
ref.7 (len:154478) : NC_000932.1 Arabidopsis thaliana chloroplast, complete genome

ERROR: length parameters are not appropriate.
yukiteruono commented 4 months ago

Thank you for your using PBSIM3. For each read, the length is randomly drawn from the gamma distribution with given mean and standard deviation. If the standard deviation is too small for the length, the length distribution will deviate from what PBSIM expected ("ERROR: length parameters are not appropriate."). In this case, we recommend increasing the standard deviation a little and then limiting the length as shown below. --length-mean 92000 --length-sd 12000 --length-min 75000 --length-max 105000

jwalewski commented 4 months ago

Hello! Thanks for your timely response, and my apologies for a comparatively delayed one.

First, I want to say that those parameters did result in that issue resolving, so I am thankful. Otherwise, though, it seems I've run into a secondary issue (seg faults of increasing numbers at higher coverages).

Is this an issue with my system specs? I have 128GB RAM available.

Any further assistance is greatly appreciated!

munmap_chunk(): invalid pointer
./PBSIM3_NANOPORE.Sh: line 62:   672 Aborted                 (core dumped) pbsim --strategy wgs --genome $Refrence_Path$Refrence_Name --depth $Cov_min --qshmm /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model --method qshmm --prefix $PACBIO_INPUT_DIR$OUTPUTNAME --length-mean 92000 --length-sd 12000 --length-min 75000 --length-max 105000
gzip: /mnt/g/Ultimate/silico_data/Nanopore_Input/Athaliana.fa2Xv1.fq.gz already exists; do you wish to overwrite (y or n)? n
        not overwritten
:::: Simulation parameters :::

strategy : wgs
method : qshmm
qshmm : /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model
genome : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa
prefix : /mnt/g/Ultimate/silico_data/Nanopore_Input/Athaliana.fa6Xv1
id-prefix : S
depth : 6.000000
length-mean : 92000.000000
length-sd : 12000.000000
length-min : 75000
length-max : 105000
difference-ratio : 6:55:39
seed : 1708040435
accuracy-mean : 0.850000
pass_num : 1
hp-del-bias : 1.000000

:::: Reference stats ::::

file name : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa

ref.1 (len:30427671) : NC_003070.9 Arabidopsis thaliana chromosome 1 sequence
ref.2 (len:19698289) : NC_003071.7 Arabidopsis thaliana chromosome 2 sequence
ref.3 (len:23459830) : NC_003074.8 Arabidopsis thaliana chromosome 3 sequence
ref.4 (len:18585056) : NC_003075.7 Arabidopsis thaliana chromosome 4 sequence
ref.5 (len:26975502) : NC_003076.8 Arabidopsis thaliana chromosome 5 sequence
ref.6 (len:367808) : NC_037304.1 Arabidopsis thaliana ecotype Col-0 mitochondrion, complete genome
ref.7 (len:154478) : NC_000932.1 Arabidopsis thaliana chloroplast, complete genome

:::: Simulation stats (ref.1) ::::

read num. : 1966
depth : 6.000887
read length mean (SD) : 92875.385554 (7113.349337)
read length min : 76358
read length max : 109556
read accuracy mean (SD) : 0.852036 (0.041505)
substitution rate. : 0.008909
insertion rate. : 0.081616
deletion rate. : 0.057784

./PBSIM3_NANOPORE.Sh: line 62:   682 Segmentation fault      (core dumped) pbsim --strategy wgs --genome $Refrence_Path$Refrence_Name --depth $Cov_min --qshmm /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model --method qshmm --prefix $PACBIO_INPUT_DIR$OUTPUTNAME --length-mean 92000 --length-sd 12000 --length-min 75000 --length-max 105000
:::: Simulation parameters :::

strategy : wgs
method : qshmm
qshmm : /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model
genome : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa
prefix : /mnt/g/Ultimate/silico_data/Nanopore_Input/Athaliana.fa10Xv1
id-prefix : S
depth : 10.000000
length-mean : 92000.000000
length-sd : 12000.000000
length-min : 75000
length-max : 105000
difference-ratio : 6:55:39
seed : 1708040447
accuracy-mean : 0.850000
pass_num : 1
hp-del-bias : 1.000000

:::: Reference stats ::::

file name : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa

ref.1 (len:30427671) : NC_003070.9 Arabidopsis thaliana chromosome 1 sequence
ref.2 (len:19698289) : NC_003071.7 Arabidopsis thaliana chromosome 2 sequence
ref.3 (len:23459830) : NC_003074.8 Arabidopsis thaliana chromosome 3 sequence
ref.4 (len:18585056) : NC_003075.7 Arabidopsis thaliana chromosome 4 sequence
ref.5 (len:26975502) : NC_003076.8 Arabidopsis thaliana chromosome 5 sequence
ref.6 (len:367808) : NC_037304.1 Arabidopsis thaliana ecotype Col-0 mitochondrion, complete genome
ref.7 (len:154478) : NC_000932.1 Arabidopsis thaliana chloroplast, complete genome

./PBSIM3_NANOPORE.Sh: line 62:   689 Segmentation fault      (core dumped) pbsim --strategy wgs --genome $Refrence_Path$Refrence_Name --depth $Cov_min --qshmm /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model --method qshmm --prefix $PACBIO_INPUT_DIR$OUTPUTNAME --length-mean 92000 --length-sd 12000 --length-min 75000 --length-max 105000
:::: Simulation parameters :::

strategy : wgs
method : qshmm
qshmm : /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model
genome : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa
prefix : /mnt/g/Ultimate/silico_data/Nanopore_Input/Athaliana.fa14Xv1
id-prefix : S
depth : 14.000000
length-mean : 92000.000000
length-sd : 12000.000000
length-min : 75000
length-max : 105000
difference-ratio : 6:55:39
seed : 1708040493
accuracy-mean : 0.850000
pass_num : 1
hp-del-bias : 1.000000

:::: Reference stats ::::

file name : /mnt/f/Ultimate/Organismal_Data/Genomes/Athaliana.fa

ref.1 (len:30427671) : NC_003070.9 Arabidopsis thaliana chromosome 1 sequence
ref.2 (len:19698289) : NC_003071.7 Arabidopsis thaliana chromosome 2 sequence
ref.3 (len:23459830) : NC_003074.8 Arabidopsis thaliana chromosome 3 sequence
ref.4 (len:18585056) : NC_003075.7 Arabidopsis thaliana chromosome 4 sequence
ref.5 (len:26975502) : NC_003076.8 Arabidopsis thaliana chromosome 5 sequence
ref.6 (len:367808) : NC_037304.1 Arabidopsis thaliana ecotype Col-0 mitochondrion, complete genome
ref.7 (len:154478) : NC_000932.1 Arabidopsis thaliana chloroplast, complete genome

:::: Simulation stats (ref.1) ::::

read num. : 4597
depth : 14.001088
read length mean (SD) : 92673.592995 (7215.202700)
read length min : 76196
read length max : 109453
read accuracy mean (SD) : 0.851529 (0.042942)
substitution rate. : 0.008936
insertion rate. : 0.081842
deletion rate. : 0.057948

./PBSIM3_NANOPORE.Sh: line 62:   702 Segmentation fault      (core dumped) pbsim --strategy wgs --genome $Refrence_Path$Refrence_Name --depth $Cov_min --qshmm /mnt/c/ultimate/Other_Software/pbsim3-master/data/QSHMM-ONT.model --method qshmm --prefix $PACBIO_INPUT_DIR$OUTPUTNAME --length-mean 92000 --length-sd 12000 --length-min 75000 --length-max 105000
yukiteruono commented 4 months ago

Thank you for reporting the trouble. The cause of the trouble was a malloc error. Please use v3.0.2.

jwalewski commented 4 months ago

This fixed the issue. Thanks so much!

albidgy commented 2 months ago

Hello,

I have the same problem (Segmentation fault) for transcripts simulation. May be here the same error as in wgs mode?

pbsim --strategy trans --method qshmm --qshmm ../pbsim3-3.0.4/data/QSHMM-RSII.model --transcript ./simulated_transcripts/simulated_reads_rep1.transcript --prefix simulated_reads_rep1 

:::: Simulation parameters :::

strategy : trans
method : qshmm
qshmm : ../pbsim3-3.0.4/data/QSHMM-RSII.model
transcript : ./simulated_transcripts/simulated_reads_rep1.transcript
prefix : simulated_reads_rep1
id-prefix : S
length-mean : 9000.000000
length-sd : 7000.000000
length-min : 100
length-max : 1000000
difference-ratio : 6:55:39
seed : 1713793734
accuracy-mean : 0.850000
pass_num : 1
hp-del-bias : 1.000000

:::: transcript stats ::::

file name : ./simulated_transcripts/simulated_reads_rep1.transcript
transcript num : 59893
total expression value : 17297071

Segmentation fault (core dumped)

Thank you!

yukiteruono commented 2 months ago

Thank you for using PBSIM3. As long as I checked the parameters and stats, there are no problems. If you provide us with the input data(simulated_reads_rep1.transcript), we will investigate the cause. yono@k.u-tokyo.ac.jp