Open NickleDave opened 3 years ago
extract
out of features
features
as a namespace, with a matching entry point so that user-defined features could be installed as plug-insfeatures
sub-package into a separate extract
sub-packagefunctional
extract moduleFeatureExtractor
as a separate class that just uses functions from the functional
module for its methods There should be a config
sub-package with the current parseconfig
module inside of it.
Currently there's a parse
sub-package -- naming not quite explicit enough.
edit: would prefer a sub-package named config
with a module named parse
having a function parse_yaml_config
A lot of the logic around configuration parsing could be moved into this module so that other functions just do straight iteration over some sort of attrs
class / dataclass instance, e.g. a class called Config
with an attribute todolist
that is literally a list of instances called TodoItem
or something, each item having spect_params
, seg_params
etc.
If an item in the .yml file declares its own spect_params
and those need to take precedence over top-level spect params, that can be determined during config parsing to produce the TodoList
class as expected, instead of what currently happens where each function internally determines whether there are spect_params
for the current item in the to-do list that need to take precedence over global spect_params
Some of the "private" helper functions for parsing configs might be useful to a general user.
Of course trade-off is those would then definitely need to be unit-tested and generally better maintained.
Is there key functionality that's worth factoring out?
E.g., I am using hvc.parse.extract._validate_feature_group_and_convert_to_list
for the tweetynet
paper to just get the list of svm
features so I can iterate over them.
Seems like having a general model_name_to_features_list
function could be useful, especially under a framework where a "model" is specified in a way that includes its set of features.
core code in FeatureExtractor
is basically unreadable, in particular the method _from_file
.
All the weird stuff I did with if x in locals
and np.concatenate
really obscures what's going on.
It also prevents any kind of optimization that uses parallelization, e.g. dask
.
I think the easiest thing to do would be re-write this in a possibly slow but readable way, make sure I didn't break it, and then optimize from there.
starting this issue where I will make notes on how to possible refactor
hvc
codebase so it is less spaghetti code. May break some comments out into separate issues