hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
94 stars 6 forks source link

Problem merging files #115

Closed MarioRinBarr closed 3 months ago

MarioRinBarr commented 4 months ago

Hello,

For an analysis that I want to perform, I am trying to generate a blow5 file filtering only a series of records that I need. So, I have used the get command, to create several separate files:

while read p; do slow5tools get all.blow5 “$p” -or “$p”.blow5 done < record_list.txt

and then I wanted to merge them all together, using the merge command:

slow5tools merge separated_records/ -o selected_data.blow5

However, when I run this last command I get an error:

[list_all_items] Looking for '*.slow5' files in separated_records/ [merge_main] 197768 files found - took 0.206s

[merge_main] Allocating new read group numbers - took 4.601s

[slow5_get_next_mem::ERROR] Malformed blow5 record. Failed to read the record size. Missing blow5 end of file marker. At src/slow5.c:3236

What could be happening to make this last command not work?

Thank you very much

Translated with DeepL.com (free version)

hasindu2008 commented 4 months ago

Hello,

What is the record_list.txt like? I am wondering why you are not giving the list directly like:

slow5tools get all.blow5 --list  record_list.txt -o all_selected_reads.blow5

I am asking because that will be much faster and more efficient than using a bash loop. The bash loop will spawn a process of slow5tools for every single readID which means the index will have to be loaded every single time. I am not surprised if the 197768 reads took days, instead of minutes.

If you still want to stick to the bash loop method, can you replace -or “$p”.blow5 with -o “$p”.blow5 in your bash loop and see if the error persists?

hasindu2008 commented 3 months ago

Has this issue been addressed?

hasindu2008 commented 3 months ago

Closing this issue for now. If you are still having trouble, feel free to reopen.