marcus1487 / nanoraw

Genome guided re-segmention and visualization for raw nanopore sequencing data.
https://pypi.python.org/pypi/nanoraw
Other
46 stars 9 forks source link

R7.3 #32

Open JohnUrban opened 7 years ago

JohnUrban commented 7 years ago

Hi,

This tool seems great. Is there any chance it could be used with R7.3 data (SQK-MAP006, r7.3_e6_70bps_6mer, MkI) ?

best,

John

marcus1487 commented 7 years ago

Currently the code does not work with R7 data. The algorithm is certainly applicable, but the file format is the biggest issue. If support were to be added it would likely involve converting the base called fast5 files to R9 format (including the raw data, which is currently in a separate file). Then the files could easily be used with the rest of functions from nanoraw. I don't plan to add this functionality at the moment, but could look into it at some point.

JohnUrban commented 7 years ago

Thanks for getting back to me.

I have a bunch of python-based tools in place for working with fast5s prior to R9 -- and they seem to still work for the most part with R9: https://github.com/JohnUrban/fast5tools

I will take the time to add the functionality to fast5tools that will enable the conversion of base-called r7.3 files to R9 format. Fortunately, I believe I saved all of the raw data from these runs.

It would be extremely helpful if I could ask questions here about the minimal set of hdf5 paths that need to be present/accessed by nanoraw -- that way I can focus on what is important instead of a full conversion….

marcus1487 commented 7 years ago

That sounds great. All of the slots that are accessed should be in the get_read_data function (line 420) in the correct_raw.pyscript. I do remember having to write a script to match up the raw and base called files. You have to match the files up by the read_id slot instead of the file names. Let me know if you run into any other issues though. Would be great to have a command or just a helper script for this conversion!