Closed hiruna72 closed 3 months ago
On commit 7318deb1ad65796fb72b15392e7:
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ ./squigulator --seed 100 -x rna-r9-prom /mnt/d/genome//gencode.v40.transcripts.fa -o a.slow5 -n 1
[init_core::INFO] builtin RNA R9 nucleotide model loaded
[INFO] load_ref: Loaded 246624 reference sequences with total length 417.726536 Mbases
[INFO] print_model_stat: digitisation: 2048.0; sample_rate: 3000.0; range: 548.8; offset_mean: -231.9; offset_std: 12.9; dwell_mean: 43.0; dwell_std: 35.0
[INFO] sim_main: 1/1 reads done
[main] Version: 0.3.0-dirty
[main] CMD: ./squigulator --seed 100 -x rna-r9-prom -o a.slow5 -n 1 /mnt/d/genome//gencode.v40.transcripts.fa
[main] Real time: 0.586 sec; CPU time: 0.203 sec; Peak RAM: 0.430 GB
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ ./squigulator --seed 100 -x rna-r9-prom /mnt/d/genome//gencode.v40.transcripts.fa -o b.slow5 -n 1 -t 1
[init_core::INFO] builtin RNA R9 nucleotide model loaded
[INFO] load_ref: Loaded 246624 reference sequences with total length 417.726536 Mbases
[INFO] print_model_stat: digitisation: 2048.0; sample_rate: 3000.0; range: 548.8; offset_mean: -231.9; offset_std: 12.9; dwell_mean: 43.0; dwell_std: 35.0
[INFO] sim_main: 1/1 reads done
[main] Version: 0.3.0-dirty
[main] CMD: ./squigulator --seed 100 -x rna-r9-prom -o b.slow5 -n 1 -t 1 /mnt/d/genome//gencode.v40.transcripts.fa
[main] Real time: 0.593 sec; CPU time: 0.156 sec; Peak RAM: 0.430 GB
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ diff a.slow5 b.slow5
Does your diff pass?
yes
Seems like it works for me, even with different options and k-mer models
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ ./squigulator --seed 100 /mnt/d/genome/gencode.v40.transcripts.fa -o a.slow5 -n 1 --kmer-model ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model
[read_model::WARNING] Reading the model from file ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model. This is an experimental feature. Use with caution At src/model.c:44
[read_model::INFO] k-mer size in file ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model is 5
[INFO] load_ref: Loaded 246624 reference sequences with total length 417.726536 Mbases
[INFO] print_model_stat: digitisation: 2048.0; sample_rate: 4000.0; range: 748.6; offset_mean: -237.4; offset_std: 14.2; dwell_mean: 9.0; dwell_std: 4.0
[INFO] sim_main: 1/1 reads done
[main] Version: 0.3.0-dirty
[main] CMD: ./squigulator --seed 100 -o a.slow5 -n 1 --kmer-model ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model /mnt/d/genome/gencode.v40.transcripts.fa
[main] Real time: 1.372 sec; CPU time: 0.594 sec; Peak RAM: 0.429 GB
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ ./squigulator --seed 100 /mnt/d/genome/gencode.v40.transcripts.fa -o b.slow5 -n 1 --digitisation 2048 --sample-rate 3000 --range 548.788269 --offset-mean -231.9440589 --offset-std 12.87185278 --median-before-mean 238.5286796 --median-before-std 21.1871794 --dwell-mean 43.0 --dwell-std 35.0 --kmer-model ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model
[check_args::WARNING] Option --dwell-mean is ignored when --sample-rate is provided At src/sim.c:759
[INFO] sim_main: dwell_mean=43.0,sample_rate=3000.0,set bps=sample_rate/dwell_mean=70.0
[INFO] sim_main: bps=70.0,sample_rate=3000.0,set dwell_mean=sample_rate/bps=43.0
[read_model::WARNING] Reading the model from file ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model. This is an experimental feature. Use with caution At src/model.c:44
[read_model::INFO] k-mer size in file ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model is 5
[INFO] load_ref: Loaded 246624 reference sequences with total length 417.726536 Mbases
[INFO] print_model_stat: digitisation: 2048.0; sample_rate: 3000.0; range: 548.8; offset_mean: -231.9; offset_std: 12.9; dwell_mean: 43.0; dwell_std: 35.0
[INFO] sim_main: 1/1 reads done
[main] Version: 0.3.0-dirty
[main] CMD: ./squigulator --seed 100 -o b.slow5 -n 1 --digitisation 2048 --sample-rate 3000 --range 548.788269 --offset-mean -231.9440589 --offset-std 12.87185278 --median-before-mean 238.5286796 --median-before-std 21.1871794 --dwell-mean 43.0 --dwell-std 35.0 --kmer-model ../f5c/test/r9-models/r9.4_70bps.u_to_t_rna.5mer.template.model /mnt/d/genome/gencode.v40.transcripts.fa
[main] Real time: 0.865 sec; CPU time: 0.469 sec; Peak RAM: 0.430 GB
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ slow5tools skim a.slow5
#read_id read_group digitisation offset range sampling_rate len_raw_signal raw_signal channel_number median_before read_number start_mux start_time
S1_1!ENST00000651234.1|ENSG00000188157.15|OTTHUMG00000040778.7|OTTHUMT00000099680.4|AGRN-209|AGRN|7477|protein_coding|!833!4811!- 0 2048 -260.168371 748.5801 4000 35526 . 0239.864613 0 0 0
[main] cmd: slow5tools skim a.slow5
[main] real time = 0.002 sec | CPU time = 0.000 sec | peak RAM = 0.002 GB
hasindu@hasindu-3660:/mnt/d/hasindu2008.git/squigulator$ slow5tools skim b.slow5
#read_id read_group digitisation offset range sampling_rate len_raw_signal raw_signal channel_number median_before read_number start_mux start_time
S1_1!ENST00000651234.1|ENSG00000188157.15|OTTHUMG00000040778.7|OTTHUMT00000099680.4|AGRN-209|AGRN|7477|protein_coding|!833!4811!- 0 2048 -252.635553 548.788269 3000 182458 . 0268.611434 0 0 0
[main] cmd: slow5tools skim b.slow5
[main] real time = 0.006 sec | CPU time = 0.000 sec | peak RAM = 0.003 GB
As we realised if -x
is not provided the default profile r9-dna
is chosen. this can cause the reads to be different depending on the DNA/RNA specs.
It is a good idea to print the profile to stderr.
On dev branch when I create two slow5 files; one with preset
-x rna-r9-prom
and one without preset but providing all the arguments--digitisation 2048 --sample-rate 3000 --range 548.788269 --offset-mean -231.9440589 --offset-std 12.87185278 --median-before-mean 238.5286796 --median-before-std 21.1871794 --dwell-mean 43.0 --dwell-std 35.0
I get two different simulated regions even though I provide the same--seed 100
. Is this expected?