hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
90 stars 6 forks source link

Error in slow5tools get #113

Closed loganylchen closed 2 weeks ago

loganylchen commented 1 month ago

Hi there,

I am trying to use slow5tools get to get a subset of signals of some reads. I prepared a list of read names and run slow5tools get results/merged.blow5 -l results/splited_data/150/1.0_0/control_reads.txt. But it showed [slow5_idx_get::ERROR] Read ID 'db7631a3-273c-4f9b-8e60-5069cf88f40' was not found. At src/slow5_idx.c:522

But I found that the read should be 'db7631a3-273c-4f9b-8e60-5069cf88f406', which ended with an extra '6'. I don't know why the slow5tools only recognized part of the read id as id.

Can you help me with this issue? (slow5tools version: 1.1.0)

Best, Logan

hasindu2008 commented 1 month ago

you please do a grep "db7631a3-273c-4f9b-8e60-5069cf88f40" results/splited_data/150/1.0_0/control_reads.txt and paste what you get?

Psy-Fer commented 1 month ago

Also double check that the readIDs have the same Parent ID, otherwise it's a split read, and you won't find its readID in the raw data

loganylchen commented 1 month ago

you please do a grep "db7631a3-273c-4f9b-8e60-5069cf88f40" results/splited_data/150/1.0_0/control_reads.txt and paste what you get?

I did it.

`grep 'db7631a3-273c-4f9b-8e60-5069cf88f40' results/splited_data/150/1.0_0/control_reads.txt 1 ↵ ──(Wed,May22)─┘

db7631a3-273c-4f9b-8e60-5069cf88f406`

loganylchen commented 1 month ago

I am now trying to pyslow5 package to do the same thing.

s5=pyslow5.Open('results/merged.blow5','r')
reads = [i.strip() for i in open('results/splited_data/150/1.0_0/control_reads.txt')]
selected_reads = s5.get_read_list(reads)

for r, read in zip(reads , selected_reads):
        if read is not None:
                print(r, read['read_id'])
        else:
                print(r, "read not found")

All the reads could be found.

hasindu2008 commented 1 month ago

Weird, this is. Can you share me the results/splited_data/150/1.0_0/control_reads.txt?

loganylchen commented 1 month ago

control_reads.txt

Yes, here it is.

hasindu2008 commented 1 month ago

Ohh, Could you try the following file now? Shoudl work I guess. control_reads.txt

loganylchen commented 1 month ago

Yes, it works. May I ask what's the difference?

hasindu2008 commented 1 month ago

There wasn't a new line for the last readID which I added. But this is a good catch, I should fix slow5tools to handle this case

loganylchen commented 1 month ago

It is wired. I tried adding another '\n' in the last line, but it raised an error as it recognized another read id ' '(empty). But when I just open the file by Vim and then save it (doing nothing, but opening and saving and closing [vim the file, then :wq in vim]), it runs ok.

hasindu2008 commented 1 month ago

@loganylchen I will handle these cases in the slow5tools soon - like ignoring empty new lines.

loganylchen commented 1 month ago

@hasindu2008 Thanks

hasindu2008 commented 2 weeks ago

This has been fixed in the dev branch now. Would be there in the new release. Thanks for the find!