Open addyblanch opened 3 years ago
Hey @addyblanch
The converter expects a top level key Reads
can your file in Python with -
>>> import h5py
>>> f = h5py.File('HQ_mapped.hdf5', 'r')
>>> list(f.keys())
['Reads']
Hi @iiSeymour, did you mean print output from?
This is what I get:
import h5py f = h5py.File('HQ_mapped.hdf5', 'r') list(f.keys()) ['Batches', 'read_ids']
Was HQ_mapped.hdf5
output by prepare_mapped_reads.py
because you should have ended up with hdf5 file like this?
Yes it was, but I did the alignment step minimap2 rather than guppy_aligner. Would that cause this issue?
Which version of Taiyaki are you using @addyblanch?
Based on the changeling in the directory v5.3.0?
I'm also having this issue with v5.3.0 and have the same top level keys: ['Batches', 'read_ids']
I'm also having this issue with v5.3.0 and have the same top level keys:
['Batches', 'read_ids']
Hopefully they are working on a solution.
@addyblanch I was able to downgrade to Taiyaki 5.0.0 and it worked. The issue seems to stem from this Taiyaki change in 5.2 linked to by @iiSeymour
The batched variant of the HDF5 mapped signal format was introduced in version 5.2. This variant replaces the Reads group with a Batches group. Each group within the Batches group contain the same set of attributes and datasets listed in the table above, but these values for a set of reads are concatenated together into one dataset per batch.
I might take a stab at trying to fix it this weekend and will send a fork along if I manage to get it working before @iiSeymour
Thanks @jackwadden for the heads up, that would be great work around short term.
This turned out to be a small bug in Taiyaki, and was an easy fix. I've submitted a pull request with the fix here.
Thats amazing thanks @jackwadden! I've made the edit on my end and set it to rerun. Fingers crossed.
I'm having issues with bonito now that were resolved by downgrading back to Taiyaki 5.0.0. The specific error was thrown by parasail. Let me know if you get a similar error. I'm back in the territory where it's most likely a problem with my code, but would be nice to know if you run into something similar.
Hi @jackwadden unfortunately no dice. Same error as before minus the last line
KeyError: "Unable to open object (object 'Reads' doesn't exist)"
Is there a fix in the work @iiSeymour if not I'll downgrade Taiyaki and ty again soon.
@addyblanch the fact that the 'Reads' directory doesn't exist means that Taiyaki (probably) still isn't emitting the non-batched version. Are you seeing the same output from list(f.keys())
? You might have to re-install Taiyaki? Maybe pop a print("changed") in main() to see if your changes are actually being adopted.
Another option might be to just use bonito end-to-end. I don't know what your use-case is, but you might be able to use this method to prepare reads and train a model. Just omit the --pretrained <model>
option when you train.
Good luck.
Hi @jackwadden, yes same output from list(f.keys())
will have a go with version 5.0.0 in the coming weeks.
I have tried the end-to-end bonito model training but it didn't solve our issues (made the assemblies slightly worse), so I was following this (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03856-0) as they seemed to have some success. I work on streptococcus and any genome we try and sequence seems to end up inflated in size and includes an awful number of pseudogenes (we suspect due to errors causing erroneous start and stop codons).
I've been through the Taiyaki pipeline to create a hdf5 file which I plan to convert into a Bonito model. I seem to have hit a snag any suggestions on what the issue is?