uwmisl / poretitioner

https://misl.cs.washington.edu
Other
3 stars 1 forks source link

Create specification for storing results in fast5 files #20

Open kdoroschak opened 4 years ago

kdoroschak commented 4 years ago

Rather than having many intermediate files, record data in fast5 files similar to DNA.

Objectives:

Investigate best practices for file format specification documentation. I don't necessarily want/need to follow this to a T, but best to know what's out there.

Would like to have this for 1.0.0, as it greatly simplifies the workflow & data management overhead associated with the tool/pipeline.

Potential improvement based on this improvement: poretitioner could copy the config from an existing fast5 file. Maybe something like --copy-config-from fast5_fname_here.fast5?

kdoroschak commented 4 years ago

Removed milestone since this should happen ASAP, not just in the future

kdoroschak commented 4 years ago

First stab: https://docs.google.com/spreadsheets/d/1fzKfsN0126TVkGrpQaeVFpW5TjtUZtTpgeIxT8omJOs/edit

kdoroschak commented 4 years ago

Q: Should we carry over information in /UniqueGlobalKey/ like operating_system? Maybe that should be left in the bulk file?

This would be easy to add back in later if needed, but also pretty annoying.

kdoroschak commented 4 years ago

Draft version 0.1 saved for posterity: FAST5 specification.xlsx

kdoroschak commented 4 years ago

Added segmenter & classification model versioning details

FAST5_specification_0.1.1.xlsx

kdoroschak commented 4 years ago

Made some changes while doing core rewrite for segment.py. This is version 0.2 and should be final (or pretty close to it) for everything up through the segmenter (NOT post-segment filtering or classification).

Version control for this is currently being handled by the google doc (+ named versions in the edit history). Not sure if/where to put it in this repo.

FAST5 specification_0.2.xlsx