Open utterances-bot opened 3 years ago
Hi Julia - this was super useful and a wonderful tutorial, thanks! Do you know when you may have time to do further tutorials showing how to deploy the model for real-time predictions?
Noel
@dempseynoel I'm working on some material on model monitoring right now so look for that! I'll look into model deployment as well, but in the meantime, I really like the resources at Put R in Prod (more Docker than I have used personally, but excellent), this talk from Alex, and anything you can find by James Blair on plumber + modeling.
Hi Julia - Thanks for these, they look great!
On 21 Apr 2021, at 17:22, Julia Silge @.**@.>> wrote:
@dempseynoelhttps://github.com/dempseynoel I'm working on some material on model monitoring right now so look for that! I'll look into model deployment as well, but in the meantime, I really like the resources at Put R in Prodhttps://putrinprod.com/ (more Docker than I have used personally, but excellent), this talk from Alexhttps://youtu.be/SwjlcYC_Iqw, and anything you can find by James Blair on plumber + modelinghttps://youtu.be/znHEW5Q6plw.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/juliasilge/juliasilge.com/issues/20#issuecomment-824191653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJFZXZKPG6TVKWB3J6GY6RTTJ33VFANCNFSM43H42LFA.
Hi Julia,
I wanted to know if you could explain why resampling was performed after splitting the data into train/test sets. It is my assumption that this step takes place after the first fitting, we can observe prediction results. Then we can resample. Is it because the model will memorize the training set, in return giving near perfect result? If so, should resampling always take place for every type of analysis which utilize train/test data? When is it appropriate to either fit() or fit_resample()? I have seen that cross validation is a method used in time-series forecasting as well.
Thank you for a great analysis.
The main purpose of resampling is to estimate how well a model is performing, so I guess it doesn't make a huge difference which order you fit()
to the training data / fit_resamples()
to resampled folds of the training data. The predictions on the training set as a whole don't really help you much for many kinds of models, though, for the reasons you mention. You can read more about "spending your data budget" and resampling to measure performance.
Why saving the model-metrics? What to do with that stuff? Any example there?
@Steviey The main reason to save the model metrics from training is to be able to track and monitor how the model performs compared to when it was trained.
Hi Julia Thank you for sharing this valuable information . I am looking forward to learning more about how to do this: "how to publish this model as an API and how to monitor its performance" However, I noticed that you mentioned 'more soon' in your article. Can you please provide an update on when we can expect to see this information? I am eager to put this knowledge into practice as soon as possible. Thank you!"
@ssna60 Yes, check out our documentation here:
Hi Julia,
This is a really helpful video, Thank you very much. Can you explain to me how I can extract interpretable decision rules from this model?
@HASSAw You may want to check out:
Predicting injuries for Chicago traffic crashes | Julia Silge
Download up-to-date city data from Chicago's open data portal and predict whether a traffic crash involved an injury with a bagged tree model.
https://juliasilge.com/blog/chicago-traffic-model/