model: Expose more information

intel / dffml

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.

https://intel.github.io/dffml/main/

MIT License

250 stars 138 forks source link

model: Expose more information #1143

Open pdxjohnny opened 3 years ago

pdxjohnny commented 3 years ago

We know we need to expose the following before stable API release:

[ ] Model type
- Classifier, Regressor, Cluster, NLP, etc. Need to enumerate all these we have right now
[ ] Model trained/not-trained status
- Boolean value which we need to know before attempted to assess accuracy or prediction

programmer290399 commented 2 years ago

I am picking this up for this week

programmer290399 commented 2 years ago

What would be the correct defining the Model type. Shall we keep it as a property of the model that is set manually while defining the model class ?
Or it should be a config property which is immutable, and we have a exhaustive set of types to choose from.
In both cases I would need an exhaustive set of types for models.
Another way to deal with this could be to return the model object's class name. by making a classmethod to return the class name of the object.

pdxjohnny commented 2 years ago

Pinging @mHash1m, @yashlamba, @0dust, @sakshamarora1, @sk-ip for more discussion on this

pdxjohnny commented 2 years ago

What would be the correct defining the Model type. Shall we keep it as a property of the model that is set manually while defining the model class ?

We should probably implement model type as a plugin type. type is also an overloaded term, we might want to consider other options. @base_entry_point("dffml.model.usage", "usage") comes to mind. Then the usage might be classification, regression, nlp, etc.

pdxjohnny commented 2 years ago

Or it should be a config property which is immutable, and we have a exhaustive set of types to choose from.

Config properties are for runtime specific values. Do we have any situations, or can we think of any, where we might want to set the usage at runtime? The only one I can think of off the top of my head is treating a classification model's output as if it was a regression model.

pdxjohnny commented 2 years ago

Putting it in the config, with a default value set, would be a convenient place for it.

pdxjohnny commented 2 years ago

Another way to deal with this could be to return the model object's class name. by making a classmethod to return the class name of the object.

We should look at pros and cons of this

mhash1m commented 2 years ago

Okay so, I'll talk here about enumerating the types.

From the types mentioned above, one may be a sub-type of the other... For example, there are nlp and cv models that can be classified under 'classifiers' and vice versa.

Models can be classified into:

supervised and unsupervised
classifiers, regressors, and clustering models.
linear models, naive bayes, decision trees, ensemble methods, neural networks, etc.
NLP, CV etc.
and maybe more

Every NLP and CV model out there could be tagged with a type from each of the lists above.

Say, a dog classifier on Pytorch could be supervised, a classifier, a CV model, and a neural network model. The approach really depends on the amount of detail we want to go into.

Also, will we be using this to later filter out models in UI? If yes, we'd want to somehow tag each model in every way we can(having multiple tags) to later have more options for filtering.

programmer290399 commented 2 years ago

Exactly @mHash1m !! We can differentiate between models in a lot of ways and we need to make sure that we're on the same page about this and thus we should definitely pick this up in next weekly sync. Until then I think I can make PR implementing Model trained/not-trained status as it is pretty simple and there's no confusion around it as well.

pdxjohnny commented 2 years ago

Let's make a metadata property which is a class variable: Model.METADATA.
- All baseconfigurable

@base_entry_point()
class SupervisionModelMetaData:
    pass

@entrypoint()
class SupervisedSupervisionModelMetaData(SupervisionModelMetaData):
    pass

@entrypoint()
class UnsupervisedSupervisionModelMetaData(SupervisionModelMetaData):
    pass

>>> SupervisionModelMetaData.load()
SupervisedSupervisionModelMetaData
UnsupervisedSupervisionModelMetaData
PartiallySupervisedSupervisionModelMetaData

@config
class ModelMetaData:
    supervision: SupervisionModelMetaData

class MyModel(Model):
    METADATA = ModelMetaData(
        supervision=SupervisedSupervisionModelMetaData,
    )

We could make a helper to define essentially and enum for plugins

@dffml.entrypoint.enum("dffml.model.metadata.supervision", "supervision")
class SupervisionModelMetaData:
    SupervisedSupervisionModelMetaData: str = "supervised"
    UnsupervisedSupervisionModelMetaData: str = "unsupervised"