Identify the capture_file (contains info about starts/stops of peptide captures) by string formatting based on the run name.
Identify the raw_file (not exactly sure what this is, probably pre-saved captures) by string formatting based on the run name.
Find the fast5 file by looking in a specified fast5 directory
Retrieve the voltage changes
Apply a basic length filter
For each capture:
Apply the feature filters
Classify the capture
Store capture information. If it didn't classify, increment a counter. If it did classify, store the capture in a dictionary (key: class #, value: list of captures)
For each of the predicted classes: (Note that there is a potential bug here due to enumerating through this capture dict instead of retrieving the key directly-- no guarantee of order)
Create a dataframe from the captures
Save the classification results to a csv file.
This is way too much functionality in one method. Separate into multiple steps.
Workflow is currently something like this:
For each run:
Identify the
capture_file
(contains info about starts/stops of peptide captures) by string formatting based on the run name.Identify the
raw_file
(not exactly sure what this is, probably pre-saved captures) by string formatting based on the run name.Find the fast5 file by looking in a specified fast5 directory
Retrieve the voltage changes
Apply a basic length filter
For each capture:
For each of the predicted classes: (Note that there is a potential bug here due to enumerating through this capture dict instead of retrieving the key directly-- no guarantee of order)
This is way too much functionality in one method. Separate into multiple steps.