grchristensen / avpd

Apache License 2.0
0 stars 4 forks source link

TODO: (Notebooks) Profile and Feature Extractor Refactor #9

Open grchristensen opened 3 years ago

grchristensen commented 3 years ago

Currently, style profiles are fed text and score text. They take a feature extractor as a dependency in order to process the text and score it. This dependency is unnecessary and leads to a restrictive design because feature extraction and profiling would like to be done at different stages for efficiency concerns. Also, profiles do not handle edge cases well which has led to hard to find bugs when trying to benchmark and utilize the profiles.

Profiles should be refactored so that they only work with numpy arrays representing extracted features. If the previous behavior is desired, profiles and feature extractors can be composed into a coordinator class that takes both as dependencies, while still allowing profiles/extractors to be composed in different ways. This is currently not needed though. Also, the flagging responsibility of the profiles should be handed off to another class, Thresholder, since there is not a general threshold that applies to all feature extractors. This class should be trainable, and the algorithm complexity for training should be documented with the class.

Feature extractors should retain their interface, but they should internally be decomposed into smaller classes, in case a similar refactor for them is needed in the future.

The benchmarking code should take advantage of this refactor by preprocessing the BAWE dataset beforehand for each feature extractor and then only utilizing the profiles/thresholders in the benchmarking process.

grchristensen commented 3 years ago

Current progress: New profile interface is decided and a EuclideanProfile class is almost completed. Happy path testing is done but need to test edge cases, but the class is technically ready for use.

Need to work on: Feature extractor refactor, thresholder, and benchmarking code.

Plan to finish by Wednesday night.