r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
282 stars 65 forks source link

Error in `vctrs::vec_locate_matches()` #1908

Closed kn1g closed 5 months ago

kn1g commented 6 months ago

I am just reporting the error because the message asked for it. Unfortunately, I cannot provide a minimal reproducible example. Dataset is huge and this might be the problem as it shows the allocation size. Still, in case you have questions and I can help, I am happy to do so an provide as much info as I can.

OS: Linux Manjaro

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz CPU family: 6 Model: 142 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 10 CPU(s) scaling MHz: 32% CPU max MHz: 4000.0000 CPU min MHz: 400.0000 BogoMIPS: 3984.00

             total       

Mem: 48190
Swap: 17525

Error in `vctrs::vec_locate_matches()`:
! Match procedure results in an allocation larger than 2^31-1 elements. Attempted allocation size was 5905060440.
ℹ In file match.c at line 2658.
ℹ Install the winch package to get additional debugging info the next time you get this error.
ℹ This is an internal error that was detected in the vctrs package.
  Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex and the full backtrace.
Backtrace:
     ▆
  1. ├─OriPatternsMini %>% group_by(Datensatz, Jahr) %>% ...
  2. ├─dplyr::left_join(., MVonly3, by = "Jahr")
  3. ├─dplyr:::left_join.data.frame(., MVonly3, by = "Jahr")
  4. │ └─dplyr:::join_mutate(...)
  5. │   └─dplyr:::join_rows(...)
  6. │     └─dplyr:::dplyr_locate_matches(...)
  7. │       ├─base::withCallingHandlers(...)
  8. │       └─vctrs::vec_locate_matches(...)
  9. └─rlang:::stop_internal_c_lib(...)
 10.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)
DavisVaughan commented 6 months ago

Do you truly expect over 5 billion rows back (5,905,060,440)?

If so, this is going to be impossible in R

If not, you likely have an issue with your join key, i.e. by = "Jahr". You are likely missing a join key there to make the matches more unique