uwmisl / poretitioner

https://misl.cs.washington.edu
Other
3 stars 1 forks source link

Refactor classify.py #63

Closed kdoroschak closed 4 years ago

kdoroschak commented 4 years ago

Workflow is currently something like this:

  1. Get all the parameters for filtering
  2. Load the classifier
  3. For each run:

    1. Identify the capture_file (contains info about starts/stops of peptide captures) by string formatting based on the run name.

    2. Identify the raw_file (not exactly sure what this is, probably pre-saved captures) by string formatting based on the run name.

    3. Find the fast5 file by looking in a specified fast5 directory

    4. Retrieve the voltage changes

    5. Apply a basic length filter

    6. For each capture:

      1. Apply the feature filters
      2. Classify the capture
      3. Store capture information. If it didn't classify, increment a counter. If it did classify, store the capture in a dictionary (key: class #, value: list of captures)
    7. For each of the predicted classes: (Note that there is a potential bug here due to enumerating through this capture dict instead of retrieving the key directly-- no guarantee of order)

      1. Create a dataframe from the captures
      2. Save the classification results to a csv file.

This is way too much functionality in one method. Separate into multiple steps.