hiruna72 / squigualiser

Visualise and analyse nanopore (ONT) raw signals
https://hiruna72.github.io/squigualiser/
MIT License
100 stars 1 forks source link

Unable to plot even with right coordinates #58

Closed moa4020 closed 2 months ago

moa4020 commented 3 months ago

Bam file:

$ cat v02.2_trimmed_aligned_moves.sam | grep c914b199-6660-4ebf-b3ef-a946446fb90e
c914b199-6660-4ebf-b3ef-a946446fb90e    0   IntendedSequenceTop_143x51  23155   0   162M1D30M98S    *   00GTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTACACTGCCACTAGAGGTGATTACTTCGCGATCGTAGGCGACTCCTGTCTTTCCATCTCCTTACTCCAGTCACGAGTACTAAGAGAGGGGCGTAATGGTTCAGCTGTGCAGTCTCTCTGTCACTCGTCTCTGTGAGTTGTGGCAGTGTACTACAACTCACAGAGACGAGTGACAGAGACGTGCACAGCTGAACCTCTCGCTTGCGAACAGTACTCGTGAGTAAATGGCAGTGGAAGGG    JJHIHOHHHGKMPMSJQMNJNLQMPOSQSHSIPNJHIKSSOSOSSJMKKIKLOLNIQJFJGA975320.//257::<==DEB@C?=@??BDCCFEKFEAB<2BDDGOJSLN////21145EPGFHHHLMFE;=DGIILJSMJGGFHDLRKJKOOSSIDABA?>;;79555BCDHAAAAAIHLGHELGOMNSIKJHOJ<<<<=????@<<;86413,+++-470./+10))))(.../1*++---,,,++,4'(:;;<<=>=>@8773-*)&%%$$%%####$')*+, 13  0.7964     ns:i:3982    i:412        mx:i:2 i:118        st:Z:2024-03-09T20:37:37.444+00:00   rn:i:771    fn:Z:FAY08556_pass_8d3f01dc_b2ece590_0.pod5       sm:f:-370.376   :0.0100338   sv:Z:pa     RG:Z:b2ece590a09effeeb3aad674970e15e0ca45791b_dna_r10.4.1_e8.2_400bps_sup@v4.3.0   c,6,1,0,0,0,0,0,0,1,1,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,1,1,0,1,0,0,1,1,0,0,1,1,0,0,0,0,1,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,0,1,1,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,1,1,0,1,0,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,1,0,1,1,0,0,0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,1,1,0,1,0,1,1,1,0,0,0,0,1,0,0,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,0,0,0,1,0,1,0,1,0,0,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,0,1,1,0,1,1,0,1,1,1,1,0,1,1,1,1,0,1,1,0,1,0,1,1,0,1,0,1,0,1,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,1,0,1,0,0,1,0,1,0,0,0,0,0,1,1,0,1,0,1,1,0,1,0,1,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,1,0,0,0,0,0,1,0,1,0,1,1,0,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,0,1,1,0,0,1,1,1,0,1,1,0,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,0,1,1,0,0,0,1,0,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,1,0,0,0,1,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,1,1,1,0,1,1,0       NM:i:17 0    AS:i:330    nn:i:16    0880829  tp:A:P  cm:i:13    9    s2:i:119   63N0N0N0N1N0N0N0N60N0N0N0N1N0N0N0N21^T30    rl:i:0
c914b199-6660-4ebf-b3ef-a946446fb90e    256 IntendedSequenceTop_143x51  117535  0   162M1D30M98S    *   00*   * s:i:13  u:f:0.7964  s:i:3982    s:i:412        mx:i:2   h:i:118        st:Z:2024-03-09T20:37:37.444+00:00    rn:i:771         fn:Z:FAY08556_pass_8d3f01dc_b2ece590_0.pod5   :f:-370.376 :f:0.0100338    :Z:pa   :i:0    RG:Z:b2ece590a09effeeb3aad674970e15e0ca45791b_dna_r10.4.1_e8.2_400bps_sup@v4.3.0         mv:B:c,6,1,0,0,0,0,0,0,1,1,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,1,1,0,1,0,0,1,1,0,0,1,1,0,0,0,0,1,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,0,1,1,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,1,1,0,1,0,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,1,0,1,1,0,0,0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,1,1,0,1,0,1,1,1,0,0,0,0,1,0,0,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,0,0,0,1,0,1,0,1,0,0,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,0,1,1,0,1,1,0,1,1,1,1,0,1,1,1,1,0,1,1,0,1,0,1,1,0,1,0,1,0,1,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,1,0,1,0,0,1,0,1,0,0,0,0,0,1,1,0,1,0,1,1,0,1,0,1,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,1,0,0,0,0,0,1,0,1,0,1,1,0,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,0,1,1,0,0,1,1,1,0,1,1,0,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,0,1,1,0,0,0,1,0,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,1,0,0,0,1,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,1,1,1,0,1,1,0 :i:17   :i:330        AS:i:330        nn:i:16   f:0.0880829 A:S i:13    i:119        MD:Z:63N0N0N0N1N0N0N0N60N0N0N0N1N0N0N0N21^T30  i:0 Z:IntendedSequenceTop_143x51,23155,+,192M1D-192S,0,17;

Read maps to multiple locations, but I don't think that should be a problem, right?

$squig/squigualiser plot -f readsOfInterest.fa -r c914b199-6660-4ebf-b3ef-a946446fb90e -s merged.slow5 -a v02.2_trimmed_aligned_moves_sorted.bam --fixed_width --save_svg -o . --region IntendedSequenceTop_143x51:117535-117570
sequence file: readsOfInterest.fa
alignment file: v02.2_trimmed_aligned_moves_sorted.bam
signal file: merged.slow5
Info: Signal to reference method using SAM/BAM ...
Number of plots: 0
Squigualiser only plots reads that span across the specified region entirely. Reduce the region interval and double check with IGV : 0
hiruna72 commented 3 months ago

Hello,

  1. You have to convert the move table to ss string format as outlined here. https://github.com/hiruna72/squigualiser?tab=readme-ov-file#option-2---basecaller-move-table-1
  2. Then use the output bam file to check if has reads within the region you are interested in (using IGV and samtools) samtools view realign_output.bam IntendedSequenceTop_143x51:117535-117570 -F 0x100

Please let me know if you face any issues following the steps.

moa4020 commented 3 months ago

I'm trying to turn the move tables to a string and I'm encountering this error (move tables emitted from dorado):

squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam v02.2_trimmed_aligned_moves.sam -o ${ALIGNMENT}
kmer_length: 1
sig_move_offset: 0
input bam: v02.2_trimmed_aligned_moves.sam
output format: paf
output file: reform_output.paf
Info: Default stride: 5
Info: Found stride to be 6, this value will be used.
Traceback (most recent call last):
  File "/home/fs01/moa4020/squigualiser/bin/squigualiser", line 8, in <module>
    sys.exit(main())
  File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/__init__.py", line 56, in main
    args.func(args)
  File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/reform.py", line 87, in run
    len_seq = len(sam_record.get_forward_sequence()) - kmer_length + 1 # to get the number of kmers
TypeError: object of type 'NoneType' has no len()
hiruna72 commented 3 months ago

Hello,

This is an error not seen before. Thanks for reporting this. I would like to closely analyse the bug. Could you send me a small zip file (a blow5 file with about 5 reads, reference sequence), the dorado version and the basecalling command?

hiruna72 commented 3 months ago

Hello,

any updates on this?

Thanks.

moa4020 commented 3 months ago

Hi Hiruna,

Unfortunately, i dont know how to separate single reads from a fast5 file. Hence I decided to re run the alignment on guppy.

Mar


From: Hiruna Samarakoon @.> Sent: Sunday, March 17, 2024 8:58:39 PM To: hiruna72/squigualiser @.> Cc: Mohith Reddy Arikatla @.>; Author @.> Subject: [EXTERNAL] Re: [hiruna72/squigualiser] Unable to plot even with right coordinates (Issue #58)

Hello, any updates on this? Thanks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread. Message ID: hiruna72/squigualiser/issues/58/2002698025@ github. com

Hello,

any updates on this?

Thanks.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/hiruna72/squigualiser/issues/58*issuecomment-2002698025__;Iw!!Aer6R9v1Nk4!6O-mbbhJmGqwKls0nfdYq0DmjmU_t55FH7P_Ah9yNJWt2GrTBILZFtRnR7srK5aKTanHXjx7qLEf5D2kyjCyT9we1YnrINFM$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A5LETJZC7DX6E3ZG3NXWA7DYYY337AVCNFSM6AAAAABETL2YEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGY4TQMBSGU__;!!Aer6R9v1Nk4!6O-mbbhJmGqwKls0nfdYq0DmjmU_t55FH7P_Ah9yNJWt2GrTBILZFtRnR7srK5aKTanHXjx7qLEf5D2kyjCyT9we1anAxcF7$. You are receiving this because you authored the thread.Message ID: @.***>

hiruna72 commented 3 months ago

Sorry, I could have given you the commands before.

Since you anyway need a blow5 file to plot the signals can you convert the fast5 to blow5 like follows?

slow5tools f2s fast5_dir -d blow5_dir
slow5tools merge blow5_dir -o merged.blow5

you can create a small slow5 file as follows,

slow5tools view blow5_dir/one.blow5 | head -n 100 > small.slow5

you can install slow5tools using prebuilt binaries or conda. https://github.com/hasindu2008/slow5tools/releases/tag/v1.1.0

Let me know how guppy basecalling goes. However, it would be great if you can send me the small.slow5 and the dorado version and the command in the meantime.

moa4020 commented 3 months ago

Perfect! Thanks for the instructions. Here's the zip file. The "IntendedSequenceTop_143x1326.fasta" is the reference I aligned my fast5 file to. The sequences for which I am trying obtain graphs are present in "test.fasta". troubleshoot.zip

hiruna72 commented 3 months ago

RUN01_.zip

Hi @moa4020,

I wrote a script (run.sh). Please change the variables where necessary to get the plots you want. I didn't face above error when I ran dorado. I used the latest dorado 0.5.3 and squigualiser 0.6.1 (installed via pip).

image

PS - The reads in the slow5 file were with poor minimap2 alignment quality. I had to reduce the alignment score threshold in IGV to get a read pileup on IGV.

moa4020 commented 3 months ago

Hi, I tried running your script with some modifications. I basecalled again with moves, this time with your code in a separate script. I am encountering this error during the re-reform step with the new k and m values.

(base) bash-4.4$ cat hiruna_run_125707.err Sat Mar 23 17:50:53 EDT 2024 running reform... calculating offsets... running re-reform... Traceback (most recent call last): File "/home/fs01/moa4020/squigualiser/bin/squigualiser", line 8, in <module> sys.exit(main()) File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/__init__.py", line 56, in main args.func(args) File "/home/fs01/moa4020/squigualiser/lib/python3.8/site-packages/src/reform.py", line 166, in run value = mv[i] IndexError: array index out of range

slow5.zip

hiruna72 commented 3 months ago

Hi @moa4020,

How big is your dataset? I ran using the same dorado version (0.5.1) on a dataset that has 500K reads. But I could not reproduce the error.

The error above is generated when the iterator i is larger than the length of the mv array. Hence, on dev branch I added a check which will print the read_name when i is larger than the mv array. Could you build squigualiser from dev branch if you have time? Then we can pinpoint the read_name (raw signal) to take a closer look.

git clone https://github.com/hiruna72/squigualiser.git --branch dev --depth 1
cd squigualiser
python3.8 -m venv venv3
source venv3/bin/activate
pip install --upgrade pip
pip install --upgrade setuptools wheel

export PYSLOW5_ZSTD=1 # if your slow5 file uses zstd compression and you have zstd installed, set
pip install .
squigualiser --help
moa4020 commented 3 months ago

My dataset is quite big, about 15 million reads.

And I ran the script again with your code.

Here's the error: kmer_length: 2 sig_move_offset: 1 input bam: /athena/cayuga_0003/scratch/moa4020/8-oxoG/squig/v02.2_trimmed_aligned_moves_pts12.bam output format: paf output file: /athena/cayuga_0003/scratch/moa4020/8-oxoG/squig/OUT/re_reform.paf Info: Default stride: 5 Info: Found stride to be 6, this value will be used. fd0c9707-b653-46a3-a622-f5e83258de7e i >= len(mv)

hiruna72 commented 3 months ago

Cool, thanks for doing this. Could you extract the raw signal and the bam record as follows and send me?

slow5tools index merged.blow5
slow5tools get merged.blow5 "fd0c9707-b653-46a3-a622-f5e83258de7e" -o read.slow5

samtools view v02.2_trimmed_aligned_moves_pts12.bam | grep "fd0c9707-b653-46a3-a622-f5e83258de7e" > move.sam

Thank you.

moa4020 commented 3 months ago

So, turns out that read is not in the slow5 file. I could try running this on a smaller, more controlled dataset. I'll keep you posted. Sorry about that.

hiruna72 commented 3 months ago

ah ha, it should be a split read then. I am learning about the split functionality in dorado. Could you post the bam record here?

moa4020 commented 2 months ago

Hi, by bam record do you mean the bam file?

hiruna72 commented 2 months ago

yes,

samtools view v02.2_trimmed_aligned_moves_pts12.bam | grep "fd0c9707-b653-46a3-a622-f5e83258de7e" > move.sam
moa4020 commented 2 months ago

So, I tried running the entire pipeline with the prescribed method on Option 2. And I'm getting the same error at the final stage.

${squigualiser}/squigualiser plot --file ${REF} --slow5 ${SIGNAL_FILE} --alignment sorted_realigned_m20.bam --output_dir ${OUTPUT_DIR} --region ${REGION} --tag_name "optionA"
sequence file: /athena/cayuga_0003/scratch/moa4020/refDir/8-oxoG_ref/m20_randomer_reference_padded.fasta
alignment file: sorted_realigned_m20.bam
signal file: /athena/cayuga_0003/scratch/moa4020/8-oxoG/rawdata/v02.2/pod5_pts12/m20randomer_pod5/m20_filtered_fast5/slow5/m20_merged.slow5
Info: Signal to reference method using SAM/BAM ...
plot (+) region: TTATGATTA_NR:801-905   read_id: 37d6f91e-7a89-47a2-9248-2a57c1325100
[slow5_idx_get::ERROR] Read ID '37d6f91e-7a89-47a2-9248-2a57c1325100' was not found. At src/slow5_idx.c:522
Traceback (most recent call last):
  File "/home/fs01/moa4020/squigualiser/squig_venv/bin/squigualiser", line 8, in <module>
    sys.exit(main())
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/__init__.py", line 56, in main
    args.func(args)
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/plot.py", line 964, in run
    draw_data['y_min'] = np.nanmin(y)
  File "<__array_function__ internals>", line 200, in nanmin
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/lib/nanfunctions.py", line 350, in nanmin
    res = np.amin(a, axis=axis, out=out, **kwargs)
  File "<__array_function__ internals>", line 200, in amin
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 2946, in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

So, I extracted the bam record using the command you posted. Here's the file:

$ cat troubleshoot.sam
37d6f91e-7a89-47a2-9248-2a57c1325100    0   TTATGATTA_NR    801 1   185S105M194S    *   0   0   ATGTTCTCTGTACTTCGTTCAGTTACGTATTGCTAGTCTCGTGACTGGAGTAAGGAGATGGAAAGACAGGAGTCGCCTACGATCGCGAAGTAATACTAAACGGATTGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTCATCTGCACTTCATGCATTGCGATCGTAGGCGACTCCTGTCTTTCCATCTCCTTACTCCAGTCACGAGTACTGTAGGAGCAGCGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTACACTGCATACAGGTGTTTTTTCTTTTTCGGATGCGTCGCGCATTCCTGTCTTCCATCTCCTTAAGAAGTCACGAGTACTTAGGCGACTCCTGTCTTTCCATCTCCTTACTCCCAGTCACGAGTACTGAAAGCATCGCGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTT&((,+'&&&(*2548668;=B@>>76444:;<99;;<{LGEDDDHHHJJI{FGC:=65558::,+7++++67<<<<>>=><<=<<<<<FFKH{GJLID{G{HRIGQHHNGEHJGG@54455;=?@BCBAA@>=?@HHG{LGADBBCKJGC@@@ACA222''''()+*+++&%$######$$%''((((*+8EEEABAB@?>>:7::<DFI{IMHRHI{{QQJIJGMGFJKEP{GUN{{NL{{HKHF{JGLUKEHGGNLEF>@BFAA@A>A@ACAACBDGTFKII{JFLLGFD?@>?E:998610-,(&%$$%%&%&'&&&&)*)''&)%&$$$&('''((''(+++,,(())),,,-0)&&%%$$%'(/2556>====?ABGEGIKIK{M{E>5>;<ABB?64./01298:DDDBBAADEEHHHIM{OLO43222==@A@>:8666DGGGGHOGIEDEB@@BGEGDFJGKF{HJJJ=8543110    NM:i:4  ms:i:186    AS:i:186    nn:i:0  tp:A:P  cm:i:9  s1:i:78 s2:i:78 de:f:0.0381 SA:Z:GGTAGGGTT_NR,801,+,375S105M1I3S,1,6;TTGAGTATT_PR,801,+,118S99M7D267S,1,13;TTATGATAT_PR,801,+,254S103M2I125S,1,18;  rl:i:0
37d6f91e-7a89-47a2-9248-2a57c1325100    2048    GGTAGGGTT_NR    801 1   375H31M1I74M3H  *   0   0   TAGGCGACTCCTGTCTTTCCATCTCCTTACTCCCAGTCACGAGTACTGAAAGCATCGCGTAATGGTTCAGCTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTT  ===?ABGEGIKIK{M{E>5>;<ABB?64./01298:DDDBBAADEEHHHIM{OLO43222==@A@>:8666DGGGGHOGIEDEB@@BGEGDFJGKF{HJJJ=8543  NM:i:6  ms:i:174    AS:i:174    nn:i:0  tp:A:P  cm:i:7  s1:i:76 s2:i:76 de:f:0.0566 SA:Z:TTATGATTA_NR,801,+,185S105M194S,1,4;TTGAGTATT_PR,801,+,118S99M7D267S,1,13;TTATGATAT_PR,801,+,254S103M2I125S,1,18;  rl:i:0
37d6f91e-7a89-47a2-9248-2a57c1325100    2048    TTGAGTATT_PR    801 1   118H41M1D2M1I3M1D7M1I5M7D39M267H    *   0   0   CTGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTCATCTGCACTTCATGCATTGCGATCGTAGGCGACTCCTGTCTTTCCATCTCCTTACTC 455;=?@BCBAA@>=?@HHG{LGADBBCKJGC@@@ACA222''''()+*+++&%$######$$%''((((*+8EEEABAB@?>>:7::<DFI{IMHRHI NM:i:13 ms:i:148    AS:i:140    nn:i:0  tp:A:P  cm:i:7  s1:i:56 s2:i:56 de:f:0.0686 SA:Z:TTATGATTA_NR,801,+,185S105M194S,1,4;GGTAGGGTT_NR,801,+,375S105M1I3S,1,6;TTATGATAT_PR,801,+,254S103M2I125S,1,18;    rl:i:0
37d6f91e-7a89-47a2-9248-2a57c1325100    2048    TTATGATAT_PR    801 1   254H48M1I8M1D7M3I4M1D3M1I5M1I2M1D9M1D13M125H    *   0   0   TGTGCATGTCTCTCTGTCACTCGTCTCTGTGAGTTGTAGTACACTGCATACAGGTGTTTTTTCTTTTTCGGATGCGTCGCGCATTCCTGTCTTCCATCTCCTTA    GGNLEF>@BFAA@A>A@ACAACBDGTFKII{JFLLGFD?@>?E:998610-,(&%$$%%&%&'&&&&)*)''&)%&$$$&('''((''(+++,,(())),,,-0)   NM:i:18 ms:i:100    AS:i:98 nn:i:0  tp:A:P  cm:i:7  s1:i:48 s2:i:48 de:f:0.1495 SA:Z:TTATGATTA_NR,801,+,185S105M194S,1,4;GGTAGGGTT_NR,801,+,375S105M1I3S,1,6;TTGAGTATT_PR,801,+,118S99M7D267S,1,13; rl:i:0

Is there a way to handle this exception and skip such reads and go ahead with visualisation of the sequences that do exist?

hiruna72 commented 2 months ago

Hello @moa4020,

  1. I meant the bam record of the dorado output's bam file. Could you paste it here?

  2. I suppose it has the sp and pi tags to identify if the record is a split read or not. Could you try using the latest dorado https://github.com/nanoporetech/dorado/releases/tag/v0.6.1? There have been number of bug fixes in dorado.

  3. I added a condition to skip split and duplex reads in reform tool. The signal not found in slow5 file error should be because 37d6f91e-7a89-47a2-9248-2a57c1325100 is a split read. Could you please try again using the dev branch?

git clone https://github.com/hiruna72/squigualiser.git --branch dev --depth 1
cd squigualiser
python3.8 -m venv venv3
source venv3/bin/activate
pip install --upgrade pip
pip install --upgrade setuptools wheel

export PYSLOW5_ZSTD=1 # if your slow5 file uses zstd compression and you have zstd installed, set
pip install .
squigualiser --help
moa4020 commented 2 months ago

Updated the bam record you asked for in the previous comment. and unfortunately after I installed the new branch of squigualiser, I'm still getting the same error:

$ cat squig_130949.err
[M::mm_idx_gen::1.215*0.00] collected minimizers
[M::mm_idx_gen::1.217*0.01] sorted minimizers
[M::main::1.217*0.01] loaded/built the index for 14 target sequence(s)
[M::mm_mapopt_update::1.217*0.01] mid_occ = 15
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 14
[M::mm_idx_stat::1.217*0.01] distinct minimizers: 71 (66.20% are singletons); average occurrences: 2.789; average spacing: 120.591; total length: 23877
[M::worker_pipeline::38.876*8.95] mapped 229253 sequences
[M::main] Version: 2.27-r1193
[M::main] CMD: /home/fs01/moa4020/miniforge3/envs/minimap2/bin/minimap2 -ax map-ont -t12 --secondary=no -o mapped_m20_basecalls.sam /athena/cayuga_0003/scratch/moa4020/refDir/8-oxoG_ref/m20_randomer_reference_padded.fasta /athena/cayuga_0003/scratch/moa4020/8-oxoG/alignment/m20_filt_moves.fastq
[M::main] Real time: 38.880 sec; CPU: 348.006 sec; Peak RSS: 0.492 GB
[slow5_idx_get::ERROR] Read ID '37d6f91e-7a89-47a2-9248-2a57c1325100' was not found. At src/slow5_idx.c:522
Traceback (most recent call last):
  File "/home/fs01/moa4020/squigualiser/squig_venv/bin/squigualiser", line 8, in <module>
    sys.exit(main())
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/__init__.py", line 56, in main
    args.func(args)
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/src/plot.py", line 964, in run
    draw_data['y_min'] = np.nanmin(y)
  File "<__array_function__ internals>", line 200, in nanmin
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/lib/nanfunctions.py", line 350, in nanmin
    res = np.amin(a, axis=axis, out=out, **kwargs)
  File "<__array_function__ internals>", line 200, in amin
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 2946, in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/home/fs01/moa4020/squigualiser/squig_venv/lib64/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
hiruna72 commented 2 months ago

Can you paste the dorado basecalling command please?

moa4020 commented 2 months ago

dorado basecaller ${DORADO}/${MODEL} --emit-moves ${pod5Dir} > ${moves_bam}

hiruna72 commented 2 months ago

Cool. Please paste the output of following too.

samtools view ${moves_bam} | grep "37d6f91e-7a89-47a2-9248-2a57c1325100" > move.sam

moa4020 commented 2 months ago

So, the read does not show up in the bam file.

hiruna72 commented 2 months ago

Interesting, If it is not in the moves_bam the record cannot propagate through reform and realign 🤔.

It must be frustrating but could you please run again the complete pipeline? This time add greps after basecalling, reform, and realign to check at what stage this culprit record gets in.

moa4020 commented 2 months ago

It's working! I just forgot to uncomment one of the steps so I've been working with an older file. Sorry.

hiruna72 commented 2 months ago

Awesome! Thanks for reporting this.