Closed KavinduJayas closed 1 year ago
Hey,
Thanks for the detailed information.
Could you give me the python version and pyslow5 version you are using?
I'll have a look in the morning to see if I can reproduce this, but any extra info would be very helpful.
Cheers, James.
Hey James,
Thanks for the quick response.
I am using a Google Colab notebook, the python version is 3.10.12 and pyslow5 version is pyslow5-1.0.0.
If you need more details please let me know.
Best regards, Kavindu.
Hey,
Okay that's helpful. I'll be sure to also do some tests in google colab.
I'll get back to you soon.
James
Hello,
Ahh so I see the problem.
There are actually 2 problems.
Your code/example is missing the header writing step. This is important because while it doesn't actually write anything when you run it, it writes it on the first record write. This is so I can do a number of type checks and other things before actually writing data. I'll add an updated example for you below.
ONT have added a 6th end_reason to their data. data_service_unblock_mux_change
is a new value. While we dynamically catch this, and you can write this with python, my helper function to get the values, is hard coded with only 5 values. This was to help those writing files from scratch. I think my favourite part of this, is they didn't add it to the end of the end_reason list, they added it to the middle. So the signal_positive
and signal_negative
values are now wrong based on integer order for all other file versions. Another reason we write slow5 was to get away from the insanity that is ONT file scheme handling. You can just set your own end_reason data labels, so not too bad.
This has given me some things to sort out in the python library, so thank you very much for posting the issue.
The code example below should solve your problems as things are now.
Cheers, James
import pyslow5
import numpy as np
F = pyslow5.Open('reads.blow5','r', DEBUG=1)
W = pyslow5.Open('modified_reads.blow5','w', DEBUG=1)
header, end_reason_labels = F.get_empty_header(aux=True)
header_original = F.get_all_headers()
new_end_reason_labels = ['unknown', 'partial', 'mux_change', 'unblock_mux_change', 'data_service_unblock_mux_change', 'signal_positive', 'signal_negative']
for i in header_original:
if i in header:
if header_original[i] is None:
continue
else:
header[i] = header_original[i]
ret = W.write_header(header, end_reason_labels=new_end_reason_labels)
print("ret: write_header(): {}".format(ret))
reads = F.seq_reads(aux='all')
records = {}
auxs = {}
for read in reads:
record, aux = F.get_empty_record(aux=True)
# record = F.get_empty_record()
for i in read:
if i == "read_id":
readID = read[i]
if i in record:
record[i] = read[i]
if i in aux:
aux[i] = read[i]
records[readID] = record
auxs[readID] = aux
print(records)
print(auxs)
ret = W.write_record_batch(records, threads=8, batchsize=200, aux=auxs)
print("ret: write_record(): {}".format(ret))
F.close()
W.close()
Hey James,
I tested the revised code you provided, and I'm happy to inform you that it is now working correctly. I also appreciate the detailed explanation, thank you for your assistance.
Best regards, Kavindu
Issue Description: I encountered an error while using the write_record() function in the pyslow5 library. The error occurs when attempting to initialize the aux_meta fields. The error message indicates that the initialization of several aux_meta fields failed for each record, leading to the inability to set them in the C s5.header.aux_meta struct. This error persists for all aux fields.
Steps to Reproduce:
Get the dataset:
wget https://slow5.page.link/hg2_prom_subsub
Untar:
tar -xf /hg2_prom_subsub
Run the code:
Error message:
I would greatly appreciate any guidance or solution to resolve this issue. Thank you in advance for your assistance!