This includes a lot of refactorings and cleanup steps of the entire code, leading to some changes in how the components are named and used.
Refactored NLUdatset
Renamed NLUdataset to NluDatset
removed out_of_scope argument to constructor; intents list will never include
OUT_OF_SCOPE_TOKEN
removed seed argument to constructor as seed should be controllable with more
flexibility
some previously public properties are now more controlled (private or read-only)
to_records() is deprecated, use to_json() instead
refactored method to_json
removed argument records (as it was probably for a very specific use case we do
not have anymore)
returns a list of dicts (like to_records()) instead of a JSON string now
refactored method dataloaders.utils.from_json to match the logic of to_json(). One
feature that is not symmetric is that only utterance, intent, and entities will be
imported. This is to avoid having changing arguments depending on what properties a
dataset can have. Also, we will probably never use this method to load data with more
attributes.
NluDataset.getitem() returns a dataset instead of a record from _data when key is
a numbers.Integral
NluDataset.sample(): Removed default value for size, default for random_state set to
None, argument stratification renamed to stratify for consistency with
train_test_split and sklearn
removed argument shuffle from method subsample_by_intent_frequency. Can use
ds.suffle().subsample_by_intent_frequency(). The current implementation never did
anything useful with the shuffle argument.
changed default values of method train_test_split() (now inherited from sklearn as None)
Refactored vendors
removed scikit-learn methods like fit(), predict(), score()
removed Sklearn superclasses (thsi feature was never really tested)
made alias an instance attribute
made all vendor attributes private (previously attributes were generally public even if meant for internal use)
Renamed classifiers
Renamed some modules
datasets -> nlu_dataset
vendors.vendors -> vendors.vendor
modules with vendor implementations renamed to match class names
Moved dataset loaders to nlubridge.dataloaders subpackage
This includes a lot of refactorings and cleanup steps of the entire code, leading to some changes in how the components are named and used.
Refactored NLUdatset
records
(as it was probably for a very specific use case we do not have anymore)stratification
renamed tostratify
for consistency with train_test_split and sklearnshuffle
from method subsample_by_intent_frequency. Can use ds.suffle().subsample_by_intent_frequency(). The current implementation never did anything useful with the shuffle argument.Refactored vendors
Renamed classifiers
Renamed some modules
Moved dataset loaders to nlubridge.dataloaders subpackage