nanoporetech / pod5-file-format

Pod5: a high performance file format for nanopore reads.
https://pod5-file-format.readthedocs.io/
Other
127 stars 18 forks source link

Pyarrow error #135

Open Psy-Fer opened 2 months ago

Psy-Fer commented 2 months ago

Hey George,

I have a user getting a strange error. I've attached the issue below, where you can also see some more context.

Any ideas what the issue here might be?

Cheers James


dear James,

thank you for the update, I just tried the newer version. I am getting an error related to the pyarrow package: trace

   len(batch.signal[batch_row_index].as_buffer()),
AttributeError: 'pyarrow.lib.LargeListScalar' object has no attribute 'as_buffer'

Originally posted by @lborcard in https://github.com/Psy-Fer/blue-crab/issues/12#issuecomment-2208307232

0x55555555 commented 2 months ago

Based on the error i suspect the file is uncompressed (and hitting an unaccounted for error)... I'm not sure how its possible to end up with an uncompressed file - how were the files created?

I'll keep digging on my side.

lborcard commented 2 months ago

If may intervene, i am the user with the error. The pod5 files were generated using Icarust https://github.com/LooseLab/Icarust . They are compatible with dorado (I used it to basecall them).

0x55555555 commented 2 months ago

Ok, I'm not familiar with how Icarust writes pod5 files, but I've completed investigating in the pod5 source and found it is due to a bug with uncompressed pod5 files and the python pod5 bindings.

I have a fix internally that will resolve the issue, and I'll get it out asap.

Psy-Fer commented 2 months ago

This makes me ask the obvious question as well. Is pore_type still not used by nanopore software?

I was under the impression here that minknow had started using it. Is this something icarust has decided to use but is not actually a field used yet?

0x55555555 commented 2 months ago

Sequencing runs on the current MinKNOW software do not set the pore type no

Psy-Fer commented 2 months ago

Hmmm okay. Thanks.

Adoni5 commented 2 months ago

Ahh okay - @Psy-Fer I'm happy to change the Icarust code to set the Pore Type to "not-set" if that would be useful.

Psy-Fer commented 2 months ago

Please make it specifically not_set with an underscore to match that of the current pod5 output.

Feel free to use the test scripts in blue-crab as boilerplate to test if your files are correct.

I'll leave in the R10.4.1 exception to the pore_type so users of older versions of icarust can convert files if they like.

James

0x55555555 commented 2 months ago

I'm in the process of deploying 0.3.12, which contains a fix for the issue of opening raw data from uncompressed pod5 files.

Thanks,

Adoni5 commented 2 months ago

Thanks George.