Refactor classify.py - Githubissues

uwmisl / poretitioner

https://misl.cs.washington.edu

Other

3 stars 1 forks source link

Refactor classify.py #63

Closed kdoroschak closed 4 years ago

kdoroschak commented 4 years ago

Workflow is currently something like this:

Get all the parameters for filtering
Load the classifier
For each run:
1. Identify the capture_file (contains info about starts/stops of peptide captures) by string formatting based on the run name.
2. Identify the raw_file (not exactly sure what this is, probably pre-saved captures) by string formatting based on the run name.
3. Find the fast5 file by looking in a specified fast5 directory
4. Retrieve the voltage changes
5. Apply a basic length filter
6. For each capture:
  1. Apply the feature filters
  2. Classify the capture
  3. Store capture information. If it didn't classify, increment a counter. If it did classify, store the capture in a dictionary (key: class #, value: list of captures)
7. For each of the predicted classes: (Note that there is a potential bug here due to enumerating through this capture dict instead of retrieving the key directly-- no guarantee of order)
  1. Create a dataframe from the captures
  2. Save the classification results to a csv file.

This is way too much functionality in one method. Separate into multiple steps.