Closed vsoch closed 2 years ago
Hey 👋
So one thing is that you shouldn't be using regression metrics in conjunction with classification models. They're incompatible by design. Each metric has a works_with
method to help you check if a model is compatible with a metric.
And then a follow up question - are there any river datasets / examples for multiclass? Thank you!
Yes, I suggest checking out the multiclass
module :)
huh, that must be a bug then, because I do create the model as a binary type and not regression. I can look into this in an evening this week / this weekend. I refactored the design a bit so the metrics are namespaced with the model name, so as long as the original model's flavor is captured when it's added, they should follow suite, but I must have a bug.
Yes, I suggest checking out the multiclass module :)
Gah now I just feel silly, I was searching the datasets looking for hints of multiclass... sorry nothing to see here!
I'll close this issue after I investigate the potential bug - I'm still developing a lot pretty quickly and things will be more stable after I add the testing suite.
Gah now I just feel silly
Don't! Even I get lost in all the River modules we have 😅
I'll close this issue after I investigate the potential bug - I'm still developing a lot pretty quickly and things will be more stable after I add the testing suite.
Sounds good! The work you're doing is really cool.
Yep it was indeed a stupid bug - I forgot a format string "f" when setting the flavor so it was setting literally to "flavor/{name}" so this:
def init_metrics(name: str):
db = get_db()
try:
- flavor = db["flavor/{name}"]
+ flavor = db[f"flavor/{name}"]
except KeyError:
raise exceptions.FlavorNotSet
db[f"metrics/{name}"] = flavor.default_metrics()
Thanks again for the help! I'm definitely making good progress - it's more fun and addictive than "real" work haha. :)
It's cool that you're using the shelve
library too, I like it a lot.
it's more fun and addictive than "real" work haha
Story of my life haha.
Just thinking out loud: one of the things I would like this kind of platform to support is non-HTTP traffic. For instance, being able to handle WebSockets and/or SSE would be cool. Indeed, I think that persistent connections play nicely with online learning and sensors etc. I had my eyes on using FastAPI for this reason. Anyway, just some food for thought! I'm sure the same can be done with Django.
Django can handle web sockets, but FastAPI is definitely faster than Django! And I agree that would be neat. The way I've designed Django River ML it's to have
And these things will make it easy to plug into whatever other Python frameworks we would be interested in! And I'd be happy to make one in FastAPI too, although I'd like to finish Django River ML first because I'm pretty excited to test it out for my use case :)
And that reminds me, I was starting to brain storm what front end views we might provide for basic functionality. It's a plugin so we can't take over the entire design of an app, but I think some basic demo or views (even to include elsewhere) would be neat. I'm going to sleep now but something to think about - let me know what ideas you have!
a pretty generic internal client that I could easily plug into another application backend (e.g., FastAPI) with only tweaks to setting things up and middleware and how the api endpoints are created
Exactly! In fact it should be able to do this online learning dance without any connection to the internet. For instance, when running a model on a closed-off device where the loop is self-contained. What matters here is to create the right abstractions via interfaces.
interacted with via also a common terminal client to make it easy (see https://github.com/vsoch/riverapi - sorry no pretty docs there yet but coming soon!)
Agreed! You need a client to interact with the "platform".
And I'd be happy to make one in FastAPI too, although I'd like to finish Django River ML first because I'm pretty excited to test it out for my use case :)
Enjoy :)
And that reminds me, I was starting to brain storm what front end views we might provide for basic functionality. It's a plugin so we can't take over the entire design of an app, but I think some basic demo or views (even to include elsewhere) would be neat.
Yes I agree with that. My experience here is that the interface should be read-only: you can't use the interface to add/remove models or what not. It's just a view.
A follow up question! I'm testing the multi-class example, and for example here is my model:
In [18]: model
Out[18]:
Pipeline (
StandardScaler (
with_std=True
),
OneVsOneClassifier (
classifier=LogisticRegression (
optimizer=SGD (
lr=Constant (
learning_rate=0.01
)
)
loss=Log (
weight_pos=1.
weight_neg=1.
)
l2=0.
intercept_init=0.
intercept_lr=Constant (
learning_rate=0.01
)
clip_gradient=1e+12
initializer=Zeros ()
)
)
)
And it has the function but I suspect it just raises not implemented error:
model.predict_
predict_many() predict_proba_many()
predict_one() predict_proba_one()
So the question for the server - the standard case is always predicting one, but this doesn't seem to work here for this multiclass flavor:
~/anaconda3/envs/river/lib/python3.10/site-packages/river/base/classifier.py in predict_proba_one(self, x)
49 # method that each classifier has to implement. Instead, we raise an exception to indicate
50 # that a classifier does not support predict_proba_one.
---> 51 raise NotImplementedError
52
53 def predict_one(self, x: dict) -> base.typing.ClfTarget:
NotImplementedError:
I think this happened because we are checking for the functions as attributes on the model, but since the abstract class implements it, it is technically there
def check_model(self, model):
for method in ("learn_one", "predict_proba_one"):
if not hasattr(model, method):
return False, f"The model does not implement {method}."
return True, None
but not actually implemented there (looks like we have predict_one and learn_one?) https://github.com/online-ml/river/blob/ec1cf318310add301afe12160cebc66eaebcec2c/river/multiclass/ovo.py#L74-L97
Anyhoo - so my questions are:
I'm also noticing that for these datasets, they are under multiclass but labeled as binary, but "a multiclass should work too!"
That alone might be the issue - that we aren't expected to use multiclass? But I was hoping to find a dataset / example that can use it (and I would like the server to support it!)
I think this happened because we are checking for the functions as attributes on the model, but since the abstract class implements it, it is technically there
This is specific to that multi-class model; it doesn't support outputting probabilities. It implements predict_one
, but not predict_proba_one
. This is very much an edge-case, because you should expect every classifier (regardless of binary or multi-class) to implement predict_proba_one
.
How should we handle this check method?
I'm not sure how to check a method is implemented with the current way River's base classes are set up.
What prediction/learn functions should the multiclass flavor use?
It should always be predict_proba_one
and learn_one
. But I would try/except the predict_proba_one
and use predict_one
if the former raises an implementation error.
I'll try that! And eventually there should be some clarity to the user about which models will work, and which not.
I tried changing to predict_one to avoid the implementation error (that worked)! but the ground truth and predictions I get back are both strings, and when that hits the 'update_metrics function it falls into the last if/else:
In [5]: prediction
Out[5]: 'path'
In [6]: ground_truth
Out[6]: 'sky'
and results in this error in river:
In [4]: metric.update(y_true=ground_truth, y_pred=prediction)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/Desktop/Code/django-river-ml/django_river_ml/client.py in <module>
----> 1 metric.update(y_true=ground_truth, y_pred=prediction)
~/anaconda3/envs/river/lib/python3.10/site-packages/river/metrics/base.py in update(self, y_true, y_pred, sample_weight)
424
425 def update(self, y_true, y_pred, sample_weight=1.0):
--> 426 self._mean.update(x=self._eval(y_true, y_pred), w=sample_weight)
427 return self
428
~/anaconda3/envs/river/lib/python3.10/site-packages/river/metrics/cross_entropy.py in _eval(self, y_true, y_pred)
46
47 def _eval(self, y_true, y_pred):
---> 48 return optim.losses.CrossEntropy()(y_true, y_pred)
~/anaconda3/envs/river/lib/python3.10/site-packages/river/optim/losses.py in __call__(self, y_true, y_pred)
246 total = 0
247
--> 248 for label, proba in y_pred.items():
249 if y_true == label:
250 total += self.class_weight.get(label, 1.0) * math.log(
AttributeError: 'str' object has no attribute 'items'
Might it be the case that the "predict_one" is returning the wrong format? Or are we in another case of a model being incorrectly matched with the metrics? For metrics I have:
[Accuracy: 0.00%,
CrossEntropy: 0.,
MacroPrecision: 0.,
MacroRecall: 0.,
MacroF1: 0.,
MicroPrecision: 0.,
MicroRecall: 0.,
MicroF1: 0.]
model flavor:
Pipeline (
StandardScaler (
with_std=True
),
OneVsOneClassifier (
classifier=LogisticRegression (
optimizer=SGD (
lr=Constant (
learning_rate=0.01
)
)
loss=Log (
weight_pos=1.
weight_neg=1.
)
l2=0.
intercept_init=0.
intercept_lr=Constant (
learning_rate=0.01
)
clip_gradient=1e+12
initializer=Zeros ()
)
)
)
I've never used Logistic regression in a multiclass case - I'm used to getting between 0 and 1 and applying some threshold for two classes. For reference I was looking here: https://riverml.xyz/latest/api/multiclass/OneVsOneClassifier/ that is grouping it under multiclass (hence why I'm probably incorrectly using it here!)
And I'm thinking about the design of model "flavors" - if it's the case that it's hard to generalize models into these flavors, it perhaps would make sense to have a direct lookup of "for model X use this base class, metrics, etc." but maybe we can still get it working for the more general flavors already here.
Thanks for your help! I should be able to make some more time this weekend to work on river - had a busy end of the week!
Mmm I'm not sure I fully understand what you're doing.
Classification metrics expect labels or dictionaries with a probability for each label. Each classification metric has a requires_label
property indicating this. Does that help?
Sorry for not being clear, let me try to better walk through it.
[Accuracy: 0.00%,
CrossEntropy: 0.,
MacroPrecision: 0.,
MacroRecall: 0.,
MacroF1: 0.,
MicroPrecision: 0.,
MicroRecall: 0.,
MicroF1: 0.]
and this is correct, at least reflected also in Chantilly here
So now we have:
- ground_truth
Out[26]: 'sky'
In [27]: prediction
Out[27]: 'path'
And calling this method:
metric.update(y_true=ground_truth, y_pred=prediction)
results in the error above - the prediction is a string and not a dict.
If we trace back up to here where the original error was, to recall our previous discussion, the flavor is multiclass (per coming from the multiclass examples)
flavor
Out[30]: <django_river_ml.flavors.MultiClassFlavor at 0x7fa1b8e11900>
so the default pred_func is predict_proba_one
which fails with the not implemented error. So I fall back to predict_one
and that is the reason we return a string.
So I think what I'm hearing is that I'm not allowed to provide a str to this metric. So here are some options for moving forward:
In any case, I do think this feels a little buggy and we should get to the bottom of it - let me know what questions you have!
Ok I think I understand. Thanks for the details.
Your problem is that OneVsOneClassifier
doesn't implement predict_proba_one
. So you can't produce probabilities. This means that the metrics which requires probability estimates, namely CrossEntropy
, should not be updated, period.
How I would handle this:
predict_proba_one
NotImplementedError
predict_one
metric.requires_label == False
How does that sound?
Okay that worked! For a regression model with mean absolute error (MAE) there simply wasn't an attribute requires_labels
and it hit the last statement because it's not a ClassificationMetric (this is for a regression model) so instead of checking for that attribute, I'm going to do the dump Pythonic thing and just try/except and if the metric works to be updated, great, if not, well it's probably not intended for the case or (if it's a bug) we will find it eventually :)
I'm done adding the multiclass, going to do the new label endpoint and write some tests and then do a PR - will post here when it's in!
okay tiny progress! https://github.com/vsoch/django-river-ml/pull/7 This is great - we have label and a multiclass example, and I think next I'm going to give a shot at a KMeans model (or unsupervised). Thanks again for your help - I suspect it's late in your time zone so I will not ping further today!
Looks cool! Keep it up :)
(or unsupervised)
Yes I believe there is a lot of interest in supporting anomaly detection.
heyo! So I am using the same logic as chantilly after I've hit the learn endpoint and want to update metrics. Here is the basic logic refactored into its own function:
This works fine for the test.py case with regression, but when I was testing a binary model, I found that one of the metrics (I think Mean Squared Error or MSE?) wouldn't enter the first if case because it isn't a classification metric, and then it would hit the else and fail because at that point, prediction was either an empty dict or a filled dict (which cannot be given to that function.) So when I inspected I'd see either something like:
So I think what was happening is that for the first binary model prediction, it returns an empty dict because it doesn't know anything. And then after that I think the dict has keys that are ground truths, and values that are the predictions? So first I added this:
And that worked for my debugging session while I had a model already established, but when I restarted from scratch I got a key error. And turns out, the ground truth wasn't a value returned by the prediction. So I changed to:
This felt a little funky to me so I wanted to double check about the correct way to go about this. This is a binary model as defined here in chantilly. I suspect this will be tweaked further when I update to allow learning without having a true label (e.g., unsupervised cases I asked about in your talk!)
And then a follow up question - are there any river datasets / examples for multiclass? Thank you!