r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
282 stars 65 forks source link

Error in `vctrs::vec_locate_matches()`: #1910

Open jedarojas opened 5 months ago

jedarojas commented 5 months ago

I am trying to do a left_join, which contains the same column names and the same data types, but I get the following error.

This is the code: BD_DESTINOS_ESTIMADOS_2 <- BD_DESTINOS_ESTIMADOS %>% left_join(eventos_mes_tip3 %>% select(TARJETA_NUMERO, F_evento, TIP_DIA, H_evento, VARIANTE_CODIGO, AGRUPACION), by = c(TARJETA_NUMERO, F_evento, TIP_DIA, H_evento, VARIANTE_CODIGO))

and this is the message error:

Error in vctrs::vec_locate_matches(): ! x and y should have the same type. ℹ In file match-joint.c at line 239. ℹ This is an internal error that was detected in the vctrs package. Please report it at https://github.com/r-lib/vctrs/issues with a reprex and the full backtrace. Backtrace: ▆

  1. ├─BD_DESTINOS_ESTIMADOS %>% ...
  2. ├─dplyr::left_join(...)
  3. ├─dplyr:::left_join.data.frame(...)
  4. │ └─dplyr:::join_mutate(...)
  5. │ └─dplyr:::join_rows(...)
  6. │ └─dplyr:::dplyr_locate_matches(...)
  7. │ ├─base::withCallingHandlers(...)
  8. │ └─vctrs::vec_locate_matches(...)
  9. └─rlang:::stop_internal_c_lib(...)
    1. └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)
DavisVaughan commented 5 months ago

This looks interesting, and is likely a bug on our end, but we need more information from you to help.

Could you please turn this into a self-contained reprex (short for minimal reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page.

You can install reprex by running (you may already have it, though, if you have the tidyverse package installed):

install.packages("reprex")

Thanks

GeeseAndQuack commented 4 months ago

Hi!

I encountered the same bug the other day when similarly using a left_join. I am trying to create a reprex of the issue but have currently been unsuccessful in recreating the bug with fake data.

I did however manage to find a work-around that may help illuminate the issue:

In the previous line I had used rbind to combine two time series dataframes with the same columns across different time periods. When I switched from "rbind" to "bind_rows" the left_join in the following lines no longer produced this error.

Not sure if this will aid a bug fix but thought it might provide a lead of sorts until I can reproduce the issue with fake data.

Thanks,

Jack

DavisVaughan commented 4 months ago

If you can further track down any kind of reprex, that would definitely be the most helpful thing for us!

ewahlstedt commented 1 month ago

Hi, I had what I think is the same error today. Hoping it can help perhaps -

I tried to run this code (edx is a very large dataset with about 9 million rows and 8 variables):

encoded_genres <- edx %>% separate_rows(genres, sep = "\\|") %>% 
  mutate(genre_indicator = 1) %>% 
  pivot_wider(names_from = genres, values_from = genre_indicator, values_fill = 0)

edx_experiment <- inner_join(edx, encoded_genres, by = "movieId")

Error in vctrs::vec_locate_matches(): ! Match procedure results in an allocation larger than 2^31-1 elements. Attempted allocation size was 61081760571. ℹ In file match.c at line 2644. ℹ Install the winch package to get additional debugging info the next time you get this error. ℹ This is an internal error that was detected in the vctrs package. Please report it at https://github.com/r-lib/vctrs/issues with a reprex and the full backtrace. Backtrace: ▆

  1. ├─dplyr::inner_join(edx, encoded_genres, by = "movieId")
  2. ├─dplyr:::inner_join.data.frame(edx, encoded_genres, by = "movieId")
  3. │ └─dplyr:::join_mutate(...)
  4. │ └─dplyr:::join_rows(...)
  5. │ └─dplyr:::dplyr_locate_matches(...)
  6. │ ├─base::withCallingHandlers(...)
  7. │ └─vctrs::vec_locate_matches(...)
  8. └─rlang:::stop_internal_c_lib(...)
  9. └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)
ewahlstedt commented 1 month ago

(I tried with left_join() as well and got the same error as above)

DavisVaughan commented 1 month ago

@ewahlstedt this is not the same error. You likely are missing a variable in by as you are trying to perform a join that would result in 61 billion rows, and we can't handle that. If you update dplyr and restart R you should also get a better error about this (from dplyr rather than vctrs). But regardless, this is a different error than the original one reported.

ewahlstedt commented 1 month ago

Ah I see that now. Sorry, I’m fairly new to R!

DavisVaughan commented 1 month ago

No problem!