icthrm / cwSDTWnano

Here we proposed two novel algorithms, the Direct Subsequence Dynamic Time Warping for nanopore raw signal search (DSDTWnano) and the continuous wavelet Subsequence DTW for nanopore raw signal search (cwSDTWnano), to enable the direct subsequence inquiry and exact mapping in the nanopore raw signal datasets. The proposed algorithms are based on the idea of Subsequence-extended Dynamic Time Warping (SDTW) and directly operates on the raw signals, without any loss of information. DSDTWnano could ensure an output of highly accurate query result and cwSDTWnano is the accelerated version of DSDTWnano, with the help of seeding and multi-scale coarsening of signals that based on continuous wavelet transform (CWT).
10 stars 2 forks source link

Fastquery mode=1 not working #6

Open harisankarsadasivan opened 3 years ago

harisankarsadasivan commented 3 years ago

I'm trying to align a small lambda raw signal to a lambda genomic reference (48Kbp) and it does not seem to work with m=1. Pls find all logs below. The signal is from your drive lambda dataset and reference is downloaded from online. it is a 2-line fasta file. genome2sig.result also seems to have valid numbers.

RAw signal: Lambda_data/Lambda_3000_signal/fa10a452-c965-406e-99b4-9f41e30d8b39.signal (from your dataset) reference genome: https://www.ncbi.nlm.nih.gov/nuccore/J02459.1?report=fasta

./sat-query -i ../../ref.fa -p ../../cwSDTWnano_Dataset_lite/Lambda_data/Lambda_3000_signal/f1ada8b7-8933-477e-aed7-fccc776bc480.signal -m 1 -o out

output: Transform genomes to signal sequence...[Beginning] 2020/10/13 Tue 12:20:25 48502 genomes are readed. Transform genomes to signal sequence...[Finished] 2020/10/13 Tue 12:20:26 (time elapse: 140ms) Short reads inquiry...[Beginning] 2020/10/13 Tue 12:20:26 61255 signals are readed. Fast inquiry...[Beginning] 2020/10/13 Tue 12:20:26 Feedback-->[-1 : -1] in the raw signal Fast inquiry...[Finished] 2020/10/13 Tue 12:20:26 (time elapse: 110ms) Short reads inquiry...[Finished] 2020/10/13 Tue 12:20:26 (time elapse: 140ms)

contents of output file: -1 -1 | 1.92252e-318 2.42844e-318 | 2.61057e-318 2.61057e-318 diff: 0

harisankarsadasivan commented 3 years ago

I debugged a bit further and saw that I was getting 3 seeds, however, legality check was failing at the first out of 3 checks made seed_align[i].back().second < seed_align[i-1].back().second || seed_align[i][0].second < seed_align[i-1][0].second. any hints on how to fix this will be useful.

Tried: tried increasing L to 192, 256 and also s=8,9,10. The same error happens. This should mean that the seed hits are not in order. Why would this be? Am I using the correct reference genome? What should I do to fix this? I can also be contacted at hariss@umich.edu. Looking forward to having this fixed.

I can see from your running_command file that you are trying out k=1 with default 128bp. Does k>1 not work?

icthrm commented 3 years ago

according to your output, you searched a 48502 bp ATCG genome on a 61255 long signal (about 61255/8 bp genome). Therefore, there will be definitely no match signal. Under this condition, we will output -1.

harisankarsadasivan notifications@github.com 于2020年10月14日周三 上午12:27写道:

I'm trying to align a small lambda raw signal to a lambda genomic reference (48Kbp) and it does not seem to work. Pls find all logs below.

./sat-query -i ../../ref.fa -p ../../cwSDTWnano_Dataset_lite/Lambda_data/Lambda_3000_signal/f1ada8b7-8933-477e-aed7-fccc776bc480.signal -m 1 -o out

output: Transform genomes to signal sequence...[Beginning] 2020/10/13 Tue 12:20:25 48502 genomes are readed. Transform genomes to signal sequence...[Finished] 2020/10/13 Tue 12:20:26 (time elapse: 140ms) Short reads inquiry...[Beginning] 2020/10/13 Tue 12:20:26 61255 signals are readed. Fast inquiry...[Beginning] 2020/10/13 Tue 12:20:26 Feedback-->[-1 : -1] in the raw signal Fast inquiry...[Finished] 2020/10/13 Tue 12:20:26 (time elapse: 110ms) Short reads inquiry...[Finished] 2020/10/13 Tue 12:20:26 (time elapse: 140ms)

contents of output file: -1 -1 | 1.92252e-318 2.42844e-318 | 2.61057e-318 2.61057e-318 diff: 0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/icthrm/cwSDTWnano/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2IPE6DHEFSBNN6DFNNJ6DSKR5VPANCNFSM4SPGWHBA .

harisankarsadasivan commented 3 years ago

@icthrm I see, how do I search the other way around? My intention is to search for the raw nanopore signal in the bigger 48502bp long genome.

harisankarsadasivan commented 3 years ago

@icthrm , So it seems you dont support searching for raw nanopore signal on the bigger genome. I tried swapping reference and operated peer to make this happen and seems the seeds dont pass linear regression. Hence, I do not think cwdsdwt works well for the purpose stated. Let me know otherwise.