Closed AdrianAntico closed 3 years ago
Thanks for the report, I believe I know how to fix the issue. Could you help me running the example?
Running the model I get "Error in catboost::catboost.train(learn_pool = TrainPool, test_pool = TestPool, : catboost/private/libs/algo/tensor_search_helpers.cpp:455: No groups in dataset. Please disable sampling or use per object sampling"
(Also I switched to CPU if that makes a difference)
@traversc sorry about that. I forgot to NULL out the GroupVariables argument...
# Load data
data <- data <- data.table::fread("https://www.dropbox.com/s/2str3ek4f4cheqi/walmart_train.csv?dl=1")
# Set negative numbers to 0
data <- data[, Weekly_Sales := data.table::fifelse(Weekly_Sales < 0, 0, Weekly_Sales)]
# Remove IsHoliday column
data[, IsHoliday := NULL]
# Change data types
data[, ":=" (Store = as.character(Store), Dept = as.character(Dept))]
# Fill gaps
data <- RemixAutoML::TimeSeriesFill(
data,
DateColumnName = "Date",
GroupVariables = c("Store","Dept"),
TimeUnit = "weeks",
FillType = "maxmax",
MaxMissingPercent = 0.25,
SimpleImpute = TRUE)
# Shrink data for example
data <- data[Store %in% c(1:3)]
# Shrink data rows
data <- data[Date < "2012-03-09"]
# Build model
TestModel <- RemixAutoML::AutoCatBoostCARMA(
# data args
data = data,
TimeWeights = 0.9999,
TargetColumnName = "Weekly_Sales",
DateColumnName = "Date",
HierarchGroups = NULL,
GroupVariables = NULL,
TimeUnit = "weeks",
TimeGroups = c("weeks","months"),
# Production args
TrainOnFull = FALSE,
SplitRatios = c(1 - 10 / 110, 10 / 110),
PartitionType = "random",
FC_Periods = 33,
TaskType = "GPU",
NumGPU = 1,
Timer = TRUE,
DebugMode = TRUE,
# Target variable transformations
TargetTransformation = FALSE,
Methods = c("YeoJohnson", "BoxCox", "Asinh", "Log", "LogPlus1", "Sqrt", "Asin", "Logit"),
Difference = FALSE,
NonNegativePred = TRUE,
RoundPreds = FALSE,
# Calendar-related features
CalendarVariables = c("week","wom","month","quarter"),
HolidayVariable = c("USPublicHolidays"),
HolidayLags = c(1,2,3),
HolidayMovingAverages = c(2,3),
# Lags, moving averages, and other rolling stats
Lags = list("weeks" = c(1,2,3,4,5,8,9,12,13,51,52,53), "months" = c(1,2,6,12)),
MA_Periods = list("weeks" = c(2,3,4,5,8,9,12,13,51,52,53), "months" = c(2,6,12)),
SD_Periods = NULL,
Skew_Periods = NULL,
Kurt_Periods = NULL,
Quantile_Periods = NULL,
Quantiles_Selected = NULL,
# Bonus features
AnomalyDetection = NULL,
XREGS = NULL,
FourierTerms = 0,
TimeTrendVariable = TRUE,
ZeroPadSeries = NULL,
DataTruncate = FALSE,
# ML grid tuning args
GridTune = FALSE,
PassInGrid = NULL,
ModelCount = 5,
MaxRunsWithoutNewWinner = 50,
MaxRunMinutes = 60*60,
# ML evaluation output
PDFOutputPath = NULL,
SaveDataPath = NULL,
NumOfParDepPlots = 0L,
# ML loss functions
EvalMetric = "RMSE",
EvalMetricValue = 1,
LossFunction = "RMSE",
LossFunctionValue = 1,
# ML tuning args
NTrees = 50L,
Depth = 6L,
L2_Leaf_Reg = NULL,
LearningRate = NULL,
Langevin = FALSE,
DiffusionTemperature = 10000,
RandomStrength = 1,
BorderCount = 254,
RSM = NULL,
GrowPolicy = "SymmetricTree",
BootStrapType = "Bayesian",
ModelSizeReg = 0.5,
FeatureBorderType = "GreedyLogSum",
SamplingUnit = "Group",
SubSample = NULL,
ScoreFunction = "Cosine",
MinDataInLeaf = 1)
# Save output (Error on this step)
qs::qsave(TestModel, file = file.path(getwd(), "Insights.Rdata"))
# Comparison (this works)
save(TestModel, file = file.path(getwd(), "Insights.Rdata"))
Hi Adrian, I think I fixed the issue. Could you try it out?
devtools::install_github("traversc/qs@5e29db0db2a2c605dd878d18f9e6fe55e7a4027c")
Then run your example.
Hi @traversc I just tested it out and it worked!
@traversc Here's a benchmark on the example I posted (just he saving to file part)
Unit: milliseconds expr min lq mean median uq max neval qs::qsave(TestModel, file = file.path(getwd(), "Insights.Rdata")) 527.2452 535.0064 608.3629 548.0963 699.4605 710.1596 30 save(TestModel, file = file.path(getwd(), "Insights.Rdata")) 3854.6459 3899.4460 3901.8063 3903.3161 3906.4496 3917.0938 30
Hi qs team,
I'm looking to save a list of 14 elements and I'm running into this error: "Error in c_qsave(x, file, preset, algorithm, compress_level, shuffle_control, : bad binding access"
The list contains several data.table's, a model object, a list of plots, individual plots, and even null elements at times. When I use save() it save without issue. If I'm using qs::qsave() inappropriately then my apologies ahead of time. Below are my computer specs and some code to recreate the error. Let me know if you need anything else to help troubleshoot.
PS Great package!
I'm working on a windows machine and here is the session info:
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] timeDate_3043.102
loaded via a namespace (and not attached): [1] Rcpp_1.0.5 compiler_4.0.3 pillar_1.4.6
[4] qs_0.23.5 iterators_1.0.12 tools_4.0.3
[7] catboost_0.24.3 digest_0.6.25 viridisLite_0.3.0
[10] lubridate_1.7.9 jsonlite_1.7.2 lifecycle_0.2.0
[13] tibble_3.0.4 gtable_0.3.0 lattice_0.20-41
[16] pkgconfig_2.0.3 rlang_0.4.7 Matrix_1.2-18
[19] foreach_1.5.0 rstudioapi_0.11 crosstalk_1.1.0.1
[22] yaml_2.2.1 parallel_4.0.3 RemixAutoML_0.3.3
[25] httr_1.4.1 dplyr_1.0.2 generics_0.0.2
[28] arules_1.6-6 vctrs_0.3.2 htmlwidgets_1.5.1
[31] grid_4.0.3 tidyselect_1.1.0 RApiSerialize_0.1.0 [34] glue_1.4.1 data.table_1.13.2 R6_2.4.1
[37] plotly_4.9.2.1 farver_2.0.3 tidyr_1.1.2
[40] ggplot2_3.3.2 purrr_0.3.4 magrittr_1.5
[43] scales_1.1.1 codetools_0.2-16 ellipsis_0.3.1
[46] htmltools_0.5.0 colorspace_1.4-1 labeling_0.3
[49] stringfish_0.14.2 RcppParallel_5.0.2 lazyeval_0.2.2
[52] doParallel_1.0.15 munsell_0.5.0 crayon_1.3.4
Code to recreate the error is below: