juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:

https://juliasilge.com/

40 stars 27 forks source link

Text predictors for #TidyTuesday chocolate ratings | Julia Silge #62

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Text predictors for #TidyTuesday chocolate ratings | Julia Silge

A data science blog

https://juliasilge.com/blog/chocolate-ratings/

albolea commented 2 years ago

Julia, Excellent work as ways! I love to learn from your tutorials! Quick question, can you give me some pointer on how to include the step_lemma on the most_memorable_characteristics? I'm getting the following error o "Error in bake(): ! most_memorable_characteristics doesn't have a lemma attribute. Make sure the tokenization step includes lemmatization."

chocolate_recipe <- recipe(rating ~ most_memorable_characteristics + country_of_bean_origin,
       data = chocolate_train) %>%
  step_tokenize(most_memorable_characteristics) %>%
  step_lemma(most_memorable_characteristics)
  step_tokenfilter(most_memorable_characteristics, max_tokens = 100) %>%
   step_tfidf(most_memorable_characteristics) %>%
  step_tokenize(country_of_bean_origin) %>%
  step_tokenfilter(country_of_bean_origin, max_tokens = 20) %>%
  step_tfidf(country_of_bean_origin)

Thank you for your work and for your time!

Best, Renato Albolea

juliasilge commented 2 years ago

@albolea You'll need to use a tokenization engine that supports lemmas, such as engine = "spacyr". Check out the examples here to see how that will work.

hareshsuppiah commented 2 years ago

Hi Julia, thanks for this.

Out of curiosity, would an SVM model work on repeated data? For example, a reflection diary by an athlete with keywords to describe successes of the day, paired with a rating value of how well they would rate that day's (training) activities.

Greatly appreciate your time.

juliasilge commented 2 years ago

@hareshsuppiah I believe most folks would use a multilevel (i.e. mixed effects or hierarchical) model with that kind of data, like what multilevelmod supports.

hareshsuppiah commented 2 years ago

Thank you, @juliasilge !

zabeelbasheer commented 2 years ago

Great tutorials, @juliasilge! As I was following the codes, I got an error while evaluating models. the error shows "All models failed. See the '.notes' column."

zabeelbasheer commented 2 years ago

When I checked the collect_notes() function, it gives the note as _"Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('steptokenize', 'step')"

juliasilge commented 2 years ago

@zabeelbasheer It sounds like either you have very old versions of recipes and/or textrecipes, or that perhaps textrecipes isn't loaded or similar? If you keep having problems, I recommend that you create a reprex (a minimal reproducible example) for this. The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌

zabeelbasheer commented 2 years ago

Thank you, @juliasilge! I am excited that I am learning yet another tidyverse function - reprex. I will check with the RStudio community later.

Thank you!

neuflaneur commented 1 year ago

Hi Julia, Thank you for these tutorials as well as for your book with Emil! The book is an excellent explanation! This said, I have one question. Is there an 'easy' way to get the outputs from keras-based models in the book into a package like IML to calculate global feature importance? I am 'stuck' so any guidance would be appreciated!

juliasilge commented 1 year ago

@neuflaneur For models built with keras that don't have direct model-based global feature importance, I would suggest using something like DALEX for model-agnostic explainability. You can read more in this chapter of Tidy Modeling with R.

neuflaneur commented 1 year ago

Hi Julia,

Thank you!

Dean Neu

Sent from Proton Mail for iOS

On Thu, Dec 1, 2022 at 12:21 PM, Julia Silge @.***> wrote:

@.***(https://github.com/neuflaneur) For models built with keras that don't have direct model-based global feature importance, I would suggest using something like DALEX for model-agnostic explainability. You can read more in this chapter of Tidy Modeling with R.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>