juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
41 stars 27 forks source link

Predicting injuries for Chicago traffic crashes | Julia Silge #20

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Predicting injuries for Chicago traffic crashes | Julia Silge

Download up-to-date city data from Chicago's open data portal and predict whether a traffic crash involved an injury with a bagged tree model.

https://juliasilge.com/blog/chicago-traffic-model/

dempseynoel commented 3 years ago

Hi Julia - this was super useful and a wonderful tutorial, thanks! Do you know when you may have time to do further tutorials showing how to deploy the model for real-time predictions?

Noel

juliasilge commented 3 years ago

@dempseynoel I'm working on some material on model monitoring right now so look for that! I'll look into model deployment as well, but in the meantime, I really like the resources at Put R in Prod (more Docker than I have used personally, but excellent), this talk from Alex, and anything you can find by James Blair on plumber + modeling.

dempseynoel commented 3 years ago

Hi Julia - Thanks for these, they look great!

On 21 Apr 2021, at 17:22, Julia Silge @.**@.>> wrote:

@dempseynoelhttps://github.com/dempseynoel I'm working on some material on model monitoring right now so look for that! I'll look into model deployment as well, but in the meantime, I really like the resources at Put R in Prodhttps://putrinprod.com/ (more Docker than I have used personally, but excellent), this talk from Alexhttps://youtu.be/SwjlcYC_Iqw, and anything you can find by James Blair on plumber + modelinghttps://youtu.be/znHEW5Q6plw.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/juliasilge/juliasilge.com/issues/20#issuecomment-824191653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJFZXZKPG6TVKWB3J6GY6RTTJ33VFANCNFSM43H42LFA.

ghost commented 3 years ago

Hi Julia,

I wanted to know if you could explain why resampling was performed after splitting the data into train/test sets. It is my assumption that this step takes place after the first fitting, we can observe prediction results. Then we can resample. Is it because the model will memorize the training set, in return giving near perfect result? If so, should resampling always take place for every type of analysis which utilize train/test data? When is it appropriate to either fit() or fit_resample()? I have seen that cross validation is a method used in time-series forecasting as well.

Thank you for a great analysis.

juliasilge commented 3 years ago

The main purpose of resampling is to estimate how well a model is performing, so I guess it doesn't make a huge difference which order you fit() to the training data / fit_resamples() to resampled folds of the training data. The predictions on the training set as a whole don't really help you much for many kinds of models, though, for the reasons you mention. You can read more about "spending your data budget" and resampling to measure performance.

Steviey commented 3 years ago

Why saving the model-metrics? What to do with that stuff? Any example there?

juliasilge commented 3 years ago

@Steviey The main reason to save the model metrics from training is to be able to track and monitor how the model performs compared to when it was trained.

ssna60 commented 1 year ago

Hi Julia Thank you for sharing this valuable information . I am looking forward to learning more about how to do this: "how to publish this model as an API and how to monitor its performance" However, I noticed that you mentioned 'more soon' in your article. Can you please provide an update on when we can expect to see this information? I am eager to put this knowledge into practice as soon as possible. Thank you!"

juliasilge commented 1 year ago

@ssna60 Yes, check out our documentation here:

HASSAw commented 4 months ago

Hi Julia,

This is a really helpful video, Thank you very much. Can you explain to me how I can extract interpretable decision rules from this model?

juliasilge commented 4 months ago

@HASSAw You may want to check out: