Closed moa4020 closed 2 months ago
Hello,
samtools view realign_output.bam IntendedSequenceTop_143x51:117535-117570 -F 0x100
Please let me know if you face any issues following the steps.
I'm trying to turn the move tables to a string and I'm encountering this error (move tables emitted from dorado):
squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam v02.2_trimmed_aligned_moves.sam -o ${ALIGNMENT}
kmer_length: 1
sig_move_offset: 0
input bam: v02.2_trimmed_aligned_moves.sam
output format: paf
output file: reform_output.paf
Info: Default stride: 5
Info: Found stride to be 6, this value will be used.
Traceback (most recent call last):
File "/home/fs01/moa4020/squigualiser/bin/squigualiser", line 8, in <module>
sys.exit(main())
File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/__init__.py", line 56, in main
args.func(args)
File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/reform.py", line 87, in run
len_seq = len(sam_record.get_forward_sequence()) - kmer_length + 1 # to get the number of kmers
TypeError: object of type 'NoneType' has no len()
Hello,
This is an error not seen before. Thanks for reporting this. I would like to closely analyse the bug. Could you send me a small zip file (a blow5 file with about 5 reads, reference sequence), the dorado version and the basecalling command?
Hello,
any updates on this?
Thanks.
Hi Hiruna,
Unfortunately, i dont know how to separate single reads from a fast5 file. Hence I decided to re run the alignment on guppy.
Mar
From: Hiruna Samarakoon @.> Sent: Sunday, March 17, 2024 8:58:39 PM To: hiruna72/squigualiser @.> Cc: Mohith Reddy Arikatla @.>; Author @.> Subject: [EXTERNAL] Re: [hiruna72/squigualiser] Unable to plot even with right coordinates (Issue #58)
Hello, any updates on this? Thanks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread. Message ID: hiruna72/squigualiser/issues/58/2002698025@ github. com
Hello,
any updates on this?
Thanks.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/hiruna72/squigualiser/issues/58*issuecomment-2002698025__;Iw!!Aer6R9v1Nk4!6O-mbbhJmGqwKls0nfdYq0DmjmU_t55FH7P_Ah9yNJWt2GrTBILZFtRnR7srK5aKTanHXjx7qLEf5D2kyjCyT9we1YnrINFM$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A5LETJZC7DX6E3ZG3NXWA7DYYY337AVCNFSM6AAAAABETL2YEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGY4TQMBSGU__;!!Aer6R9v1Nk4!6O-mbbhJmGqwKls0nfdYq0DmjmU_t55FH7P_Ah9yNJWt2GrTBILZFtRnR7srK5aKTanHXjx7qLEf5D2kyjCyT9we1anAxcF7$. You are receiving this because you authored the thread.Message ID: @.***>
Sorry, I could have given you the commands before.
Since you anyway need a blow5 file to plot the signals can you convert the fast5 to blow5 like follows?
slow5tools f2s fast5_dir -d blow5_dir
slow5tools merge blow5_dir -o merged.blow5
you can create a small slow5 file as follows,
slow5tools view blow5_dir/one.blow5 | head -n 100 > small.slow5
you can install slow5tools using prebuilt binaries or conda. https://github.com/hasindu2008/slow5tools/releases/tag/v1.1.0
Let me know how guppy basecalling goes. However, it would be great if you can send me the small.slow5 and the dorado version and the command in the meantime.
Perfect! Thanks for the instructions. Here's the zip file. The "IntendedSequenceTop_143x1326.fasta" is the reference I aligned my fast5 file to. The sequences for which I am trying obtain graphs are present in "test.fasta". troubleshoot.zip
Hi @moa4020,
I wrote a script (run.sh). Please change the variables where necessary to get the plots you want. I didn't face above error when I ran dorado. I used the latest dorado 0.5.3
and squigualiser 0.6.1
(installed via pip).
PS - The reads in the slow5 file were with poor minimap2 alignment quality. I had to reduce the alignment score threshold in IGV to get a read pileup on IGV.
Hi, I tried running your script with some modifications. I basecalled again with moves, this time with your code in a separate script. I am encountering this error during the re-reform step with the new k and m values.
(base) bash-4.4$ cat hiruna_run_125707.err Sat Mar 23 17:50:53 EDT 2024 running reform... calculating offsets... running re-reform... Traceback (most recent call last): File "/home/fs01/moa4020/squigualiser/bin/squigualiser", line 8, in <module> sys.exit(main()) File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/__init__.py", line 56, in main args.func(args) File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/reform.py", line 166, in run value = mv[i] IndexError: array index out of range
Hi @moa4020,
How big is your dataset? I ran using the same dorado version (0.5.1) on a dataset that has 500K reads. But I could not reproduce the error.
The error above is generated when the iterator i
is larger than the length of the mv
array.
Hence, on dev branch I added a check which will print the read_name
when i
is larger than the mv
array. Could you build squigualiser from dev branch if you have time? Then we can pinpoint the read_name (raw signal) to take a closer look.
git clone https://github.com/hiruna72/squigualiser.git --branch dev --depth 1
cd squigualiser
python3.8 -m venv venv3
source venv3/bin/activate
pip install --upgrade pip
pip install --upgrade setuptools wheel
export PYSLOW5_ZSTD=1 # if your slow5 file uses zstd compression and you have zstd installed, set
pip install .
squigualiser --help
My dataset is quite big, about 15 million reads.
And I ran the script again with your code.
Here's the error:
kmer_length: 2 sig_move_offset: 1 input bam: /athena/cayuga_0003/scratch/moa4020/8-oxoG/squig/v02.2_trimmed_aligned_moves_pts12.bam output format: paf output file: /athena/cayuga_0003/scratch/moa4020/8-oxoG/squig/OUT/re_reform.paf Info: Default stride: 5 Info: Found stride to be 6, this value will be used. fd0c9707-b653-46a3-a622-f5e83258de7e i >= len(mv)
Cool, thanks for doing this. Could you extract the raw signal and the bam record as follows and send me?
slow5tools index merged.blow5
slow5tools get merged.blow5 "fd0c9707-b653-46a3-a622-f5e83258de7e" -o read.slow5
samtools view v02.2_trimmed_aligned_moves_pts12.bam | grep "fd0c9707-b653-46a3-a622-f5e83258de7e" > move.sam
Thank you.
So, turns out that read is not in the slow5 file. I could try running this on a smaller, more controlled dataset. I'll keep you posted. Sorry about that.
ah ha, it should be a split read then. I am learning about the split functionality in dorado. Could you post the bam record here?
Hi, by bam record do you mean the bam file?
yes,
samtools view v02.2_trimmed_aligned_moves_pts12.bam | grep "fd0c9707-b653-46a3-a622-f5e83258de7e" > move.sam
So, I tried running the entire pipeline with the prescribed method on Option 2. And I'm getting the same error at the final stage.
${squigualiser}/squigualiser plot --file ${REF} --slow5 ${SIGNAL_FILE} --alignment sorted_realigned_m20.bam --output_dir ${OUTPUT_DIR} --region ${REGION} --tag_name "optionA"
sequence file: /athena/cayuga_0003/scratch/moa4020/refDir/8-oxoG_ref/m20_randomer_reference_padded.fasta
alignment file: sorted_realigned_m20.bam
signal file: /athena/cayuga_0003/scratch/moa4020/8-oxoG/rawdata/v02.2/pod5_pts12/m20randomer_pod5/m20_filtered_fast5/slow5/m20_merged.slow5
Info: Signal to reference method using SAM/BAM ...
plot (+) region: TTATGATTA_NR:801-905 read_id: 37d6f91e-7a89-47a2-9248-2a57c1325100
[slow5_idx_get::ERROR] Read ID '37d6f91e-7a89-47a2-9248-2a57c1325100' was not found. At src/slow5_idx.c:522
Traceback (most recent call last):
File "/home/fs01/moa4020/squigualiser/squig_venv/bin/squigualiser", line 8, in <module>
sys.exit(main())
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/__init__.py", line 56, in main
args.func(args)
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/plot.py", line 964, in run
draw_data['y_min'] = np.nanmin(y)
File "<__array_function__ internals>", line 200, in nanmin
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/lib/nanfunctions.py", line 350, in nanmin
res = np.amin(a, axis=axis, out=out, **kwargs)
File "<__array_function__ internals>", line 200, in amin
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 2946, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
So, I extracted the bam record using the command you posted. Here's the file:
$ cat troubleshoot.sam
37d6f91e-7a89-47a2-9248-2a57c1325100 0 TTATGATTA_NR 801 1 185S105M194S * 0 0 ATGTTCTCTGTACTTCGTTCAGTTACGTATTGCTAGTCTCGTGACTGGAGTAAGGAGATGGAAAGACAGGAGTCGCCTACGATCGCGAAGTAATACTAAACGGATTGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTCATCTGCACTTCATGCATTGCGATCGTAGGCGACTCCTGTCTTTCCATCTCCTTACTCCAGTCACGAGTACTGTAGGAGCAGCGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTACACTGCATACAGGTGTTTTTTCTTTTTCGGATGCGTCGCGCATTCCTGTCTTCCATCTCCTTAAGAAGTCACGAGTACTTAGGCGACTCCTGTCTTTCCATCTCCTTACTCCCAGTCACGAGTACTGAAAGCATCGCGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTT&((,+'&&&(*2548668;=B@>>76444:;<99;;<{LGEDDDHHHJJI{FGC:=65558::,+7++++67<<<<>>=><<=<<<<<FFKH{GJLID{G{HRIGQHHNGEHJGG@54455;=?@BCBAA@>=?@HHG{LGADBBCKJGC@@@ACA222''''()+*+++&%$######$$%''((((*+8EEEABAB@?>>:7::<DFI{IMHRHI{{QQJIJGMGFJKEP{GUN{{NL{{HKHF{JGLUKEHGGNLEF>@BFAA@A>A@ACAACBDGTFKII{JFLLGFD?@>?E:998610-,(&%$$%%&%&'&&&&)*)''&)%&$$$&('''((''(+++,,(())),,,-0)&&%%$$%'(/2556>====?ABGEGIKIK{M{E>5>;<ABB?64./01298:DDDBBAADEEHHHIM{OLO43222==@A@>:8666DGGGGHOGIEDEB@@BGEGDFJGKF{HJJJ=8543110 NM:i:4 ms:i:186 AS:i:186 nn:i:0 tp:A:P cm:i:9 s1:i:78 s2:i:78 de:f:0.0381 SA:Z:GGTAGGGTT_NR,801,+,375S105M1I3S,1,6;TTGAGTATT_PR,801,+,118S99M7D267S,1,13;TTATGATAT_PR,801,+,254S103M2I125S,1,18; rl:i:0
37d6f91e-7a89-47a2-9248-2a57c1325100 2048 GGTAGGGTT_NR 801 1 375H31M1I74M3H * 0 0 TAGGCGACTCCTGTCTTTCCATCTCCTTACTCCCAGTCACGAGTACTGAAAGCATCGCGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTT ===?ABGEGIKIK{M{E>5>;<ABB?64./01298:DDDBBAADEEHHHIM{OLO43222==@A@>:8666DGGGGHOGIEDEB@@BGEGDFJGKF{HJJJ=8543 NM:i:6 ms:i:174 AS:i:174 nn:i:0 tp:A:P cm:i:7 s1:i:76 s2:i:76 de:f:0.0566 SA:Z:TTATGATTA_NR,801,+,185S105M194S,1,4;TTGAGTATT_PR,801,+,118S99M7D267S,1,13;TTATGATAT_PR,801,+,254S103M2I125S,1,18; rl:i:0
37d6f91e-7a89-47a2-9248-2a57c1325100 2048 TTGAGTATT_PR 801 1 118H41M1D2M1I3M1D7M1I5M7D39M267H * 0 0 CTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTCATCTGCACTTCATGCATTGCGATCGTAGGCGACTCCTGTCTTTCCATCTCCTTACTC 455;=?@BCBAA@>=?@HHG{LGADBBCKJGC@@@ACA222''''()+*+++&%$######$$%''((((*+8EEEABAB@?>>:7::<DFI{IMHRHI NM:i:13 ms:i:148 AS:i:140 nn:i:0 tp:A:P cm:i:7 s1:i:56 s2:i:56 de:f:0.0686 SA:Z:TTATGATTA_NR,801,+,185S105M194S,1,4;GGTAGGGTT_NR,801,+,375S105M1I3S,1,6;TTATGATAT_PR,801,+,254S103M2I125S,1,18; rl:i:0
37d6f91e-7a89-47a2-9248-2a57c1325100 2048 TTATGATAT_PR 801 1 254H48M1I8M1D7M3I4M1D3M1I5M1I2M1D9M1D13M125H * 0 0 TGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTACACTGCATACAGGTGTTTTTTCTTTTTCGGATGCGTCGCGCATTCCTGTCTTCCATCTCCTTA GGNLEF>@BFAA@A>A@ACAACBDGTFKII{JFLLGFD?@>?E:998610-,(&%$$%%&%&'&&&&)*)''&)%&$$$&('''((''(+++,,(())),,,-0) NM:i:18 ms:i:100 AS:i:98 nn:i:0 tp:A:P cm:i:7 s1:i:48 s2:i:48 de:f:0.1495 SA:Z:TTATGATTA_NR,801,+,185S105M194S,1,4;GGTAGGGTT_NR,801,+,375S105M1I3S,1,6;TTGAGTATT_PR,801,+,118S99M7D267S,1,13; rl:i:0
Is there a way to handle this exception and skip such reads and go ahead with visualisation of the sequences that do exist?
Hello @moa4020,
I meant the bam record of the dorado output's bam file. Could you paste it here?
I suppose it has the sp
and pi
tags to identify if the record is a split read or not. Could you try using the latest dorado https://github.com/nanoporetech/dorado/releases/tag/v0.6.1? There have been number of bug fixes in dorado.
I added a condition to skip split and duplex reads in reform
tool. The signal not found in slow5 file error should be because 37d6f91e-7a89-47a2-9248-2a57c1325100
is a split read. Could you please try again using the dev branch?
git clone https://github.com/hiruna72/squigualiser.git --branch dev --depth 1
cd squigualiser
python3.8 -m venv venv3
source venv3/bin/activate
pip install --upgrade pip
pip install --upgrade setuptools wheel
export PYSLOW5_ZSTD=1 # if your slow5 file uses zstd compression and you have zstd installed, set
pip install .
squigualiser --help
Updated the bam record you asked for in the previous comment. and unfortunately after I installed the new branch of squigualiser, I'm still getting the same error:
$ cat squig_130949.err
[M::mm_idx_gen::1.215*0.00] collected minimizers
[M::mm_idx_gen::1.217*0.01] sorted minimizers
[M::main::1.217*0.01] loaded/built the index for 14 target sequence(s)
[M::mm_mapopt_update::1.217*0.01] mid_occ = 15
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 14
[M::mm_idx_stat::1.217*0.01] distinct minimizers: 71 (66.20% are singletons); average occurrences: 2.789; average spacing: 120.591; total length: 23877
[M::worker_pipeline::38.876*8.95] mapped 229253 sequences
[M::main] Version: 2.27-r1193
[M::main] CMD: /home/fs01/moa4020/miniforge3/envs/minimap2/bin/minimap2 -ax map-ont -t12 --secondary=no -o mapped_m20_basecalls.sam /athena/cayuga_0003/scratch/moa4020/refDir/8-oxoG_ref/m20_randomer_reference_padded.fasta /athena/cayuga_0003/scratch/moa4020/8-oxoG/alignment/m20_filt_moves.fastq
[M::main] Real time: 38.880 sec; CPU: 348.006 sec; Peak RSS: 0.492 GB
[slow5_idx_get::ERROR] Read ID '37d6f91e-7a89-47a2-9248-2a57c1325100' was not found. At src/slow5_idx.c:522
Traceback (most recent call last):
File "/home/fs01/moa4020/squigualiser/squig_venv/bin/squigualiser", line 8, in <module>
sys.exit(main())
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/__init__.py", line 56, in main
args.func(args)
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/plot.py", line 964, in run
draw_data['y_min'] = np.nanmin(y)
File "<__array_function__ internals>", line 200, in nanmin
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/lib/nanfunctions.py", line 350, in nanmin
res = np.amin(a, axis=axis, out=out, **kwargs)
File "<__array_function__ internals>", line 200, in amin
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 2946, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
Can you paste the dorado basecalling command please?
dorado basecaller ${DORADO}/${MODEL} --emit-moves ${pod5Dir} > ${moves_bam}
Cool. Please paste the output of following too.
samtools view ${moves_bam} | grep "37d6f91e-7a89-47a2-9248-2a57c1325100" > move.sam
So, the read does not show up in the bam file.
Interesting, If it is not in the moves_bam the record cannot propagate through reform and realign 🤔.
It must be frustrating but could you please run again the complete pipeline? This time add greps after basecalling, reform, and realign to check at what stage this culprit record gets in.
It's working! I just forgot to uncomment one of the steps so I've been working with an older file. Sorry.
Awesome! Thanks for reporting this.
Bam file:
Read maps to multiple locations, but I don't think that should be a problem, right?