juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
40 stars 27 forks source link

Which #TidyTuesday post offices are in Hawaii? | Julia Silge #19

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Which #TidyTuesday post offices are in Hawaii? | Julia Silge

Use tidymodels to predict post office location with subword features and a support vector machine model.

https://juliasilge.com/blog/hawaii-post-offices/

Heseraj commented 3 years ago

Hello Julia, Thank you very much for the great screencasts. I am a new leaner who avidly follow your screencasts. While running the bake code, I ran to the following error. I appreciate it if you assist me on how to deal with this problem.

post_rec %>%

  • prep() %>%
  • bake(new_data = NULL) Error: Not enough observations of 'MI/OH' to perform SMOTE. Run rlang::last_error() to see where the error occurred. And the rlang:: rlang::last_error() <error/rlang_error> Not enough observations of 'MI/OH' to perform SMOTE. Backtrace:
    1. post_rec %>% prep() %>% bake(new_data = NULL)
    2. recipes:::prep.recipe(.)
    3. themis:::bake.step_smote(x$steps[[i]], new_data = training)
    4. themis:::smote(...) Run rlang::last_trace() to see the full context.

Thank you again.

juliasilge commented 3 years ago

@Heseraj I suspect you didn't set up a new state variable that is only Hawaii/Other, and that your state variable still has all the original values in it still. I set up that with this code:

po_split <- post_offices %>%
  mutate(state = case_when(
    state == "HI" ~ "Hawaii",
    TRUE ~ "Other"
  )) %>%
  select(name, state) %>%
  initial_split(strate = state)
Rlopezra commented 3 years ago

Great post! I always learn something new from your screencast. I didn't know I could use expressions in group_by

conlelevn commented 2 years ago

@juliasilge, Hi Julia, at the end of this screencast we have some words that have positive estimate and other have negative estimate in which i think the more positive estimate the more likely the post office that have these words in their name will from Hawaii, however your comment is in opposite direction. Could you explain it in more detail?

juliasilge commented 2 years ago

@conlelevn Hmmmm, I'm pretty sure I got it right, based on the results we see (more from Hawaii starting with H, K, containing ALE, etc). The sign here on these is related to which level (Hawaii vs. other) is considered the base level (or first level, or positive case) by the model.