hasindu2008 / squigulator

a tool for simulating nanopore raw signal data
https://hasindu2008.github.io/squigulator
MIT License
61 stars 3 forks source link

--full-contigs mode only generates a fraction of the reads #12

Closed denisbeslic closed 8 months ago

denisbeslic commented 8 months ago

Hello!

I have a problem running in the full-contigs mode. My input fastq file consists of 1,000 sequences. I run the following command: resources/squigulator/squigulator -x dna-r10-prom -t 128 --full-contigs results/zymo-human/passed-1K.fastq -o results/zymo-human/passed-1K.slow5 Output:

[INFO] sim_main: Using random seed: 1707404073
[init_core::INFO] builtin DNA R10 nucleotide model loaded
[INFO] load_ref: Loaded 92 reference sequences with total length 0.212366 Mbases
[INFO] print_model_stat: digitisation: 2048.0; sample_rate: 4000.0; range: 281.3; offset_mean: -127.6; offset_std: 19.4; dwell_mean: 10.0; dwell_std: 4.0
[INFO] sim_main: 92/92 reads done
[main] Version: 0.2.0-dirty
[main] CMD: resources/squigulator/squigulator -x dna-r10-prom -t 128 --full-contigs -o results/zymo-human/passed-1K.slow5 results/zymo-human/passed-1K.fastq
[main] Real time: 1.513 sec; CPU time: 1.900 sec; Peak RAM: 1.270 GB

Squigulator generates only 92 reads out of 1,000.. Is there a reason for that? Or some kind of minimum sequence length?

Thank you!

hasindu2008 commented 8 months ago

Can you do a wc -l on the fastq file to confirm if actually it has 1000 reads?

denisbeslic commented 8 months ago

I did wc -l on the fastq file and it showed me 1,000. However, I realized that the fastq sequence information spanned over multiple lines. So, the behavior was caused by the fastq itself. Thank you for your help.