nfj1380 / mrIML

Multivariate (multi-response) ensemble learning
https://nfj1380.github.io/mrIML/
Other
6 stars 5 forks source link

Test dataset is showing issue in number of rows #9

Open Skraberg opened 3 months ago

Skraberg commented 3 months ago

Getting the below error:

Remove NAs from the feature/predictor data.

FeaturesnoNA<-Features[complete.cases(Features), ] X <- FeaturesnoNA #For simplicity

For more efficient testing for interactions (more variables more interacting pairs)

X <- FeaturesnoNA[c(1:3)] #Three features only yhats <- mrIMLpredicts(X=X, #Features/predictors

  • Y=Y, #Response data
  • Model=model1, #Specify your model
  • balance_data='no', #Chose how to balance your data
  • mode='classification', #Chose your mode (classification versus regression)
  • seed = 120) #Set seed

in vfold_cv(data_train, v = k) : The number of rows is less than v = 10

nfj1380 commented 3 months ago

Thanks for finding this bug. Try adding k=5 for the k-fold validation and make the prop 0.5. ie mrIMLpredicts(X, Y, k=5, prop=0.5...). This is because its quite a small dataset.

Skraberg commented 3 months ago

Awesome, thanks Nick. I’ll have another go at it this week.

From: Nick Fountain-Jones @.> Sent: Sunday, July 14, 2024 4:20 PM To: nfj1380/mrIML @.> Cc: Simona Kraberger @.>; Author @.> Subject: Re: [nfj1380/mrIML] Test dataset is showing issue in number of rows (Issue #9)

Thanks for finding this bug. Try adding k=5 for the k-fold validation and make the prop 0.5. ie mrIMLpredicts(X, Y, k=5, prop=0.5...). This is because its quite a small dataset.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/nfj1380/mrIML/issues/9*issuecomment-2227515409__;Iw!!IKRxdwAv5BmarQ!cMrCkBCdGI1YRwVBl2TAGcnIyZxW0L9rCxLRmXUO2TZwOPCPjmPHWvPIJL4o5CBJZq69_Yt_pZRrGocNoUx4pllmFkp2Eqg$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/BJSLV6VZT4QVBYSOK57QZ33ZMMBSHAVCNFSM6AAAAABKVP7CW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGUYTKNBQHE__;!!IKRxdwAv5BmarQ!cMrCkBCdGI1YRwVBl2TAGcnIyZxW0L9rCxLRmXUO2TZwOPCPjmPHWvPIJL4o5CBJZq69_Yt_pZRrGocNoUx4pllmiHFp1AI$. You are receiving this because you authored the thread.Message ID: @.***>

nfj1380 commented 3 months ago

The other issue with this small dataset is that its too small for racing autotuning to work (which is the default method). SO add 'racing=F ' to the command.

yhats_rf <- mrIMLpredicts(X=X,Y=Y, Model=model_rf, balance_data='no', mode='classification',k=5, tune_grid_size=5, seed = sample.int(1e8, 1), racing=F ) yhats_glm <- mrIMLpredicts(X=X,Y=Y, Model=model_glm, balance_data='no', mode='classification', k=5, tune_grid_size=1, seed = sample.int(1e8, 1), racing=F )#no need for a tuning grid - no paramters to tune for logistic regression

Skraberg commented 3 months ago

Cool thank you!

From: Nick Fountain-Jones @.> Sent: Monday, July 22, 2024 12:30 AM To: nfj1380/mrIML @.> Cc: Simona Kraberger @.>; Author @.> Subject: Re: [nfj1380/mrIML] Test dataset is showing issue in number of rows (Issue #9)

The other issue with this small dataset is that its too small for racing autotuning to work (which is the default method). SO add 'racing=F ' to the command.

yhats_rf <- mrIMLpredicts(X=X,Y=Y, Model=model_rf, balance_data='no', mode='classification',k=5, tune_grid_size=5, seed = sample.int(1e8, 1), racing=F ) yhats_glm <- mrIMLpredicts(X=X,Y=Y, Model=model_glm, balance_data='no', mode='classification', k=5, tune_grid_size=1, seed = sample.int(1e8, 1), racing=F )#no need for a tuning grid - no paramters to tune for logistic regression

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/nfj1380/mrIML/issues/9*issuecomment-2242276099__;Iw!!IKRxdwAv5BmarQ!fSReOk13lZHivDzPb1QkocCsO1c4L6YjP1ggW_xlWFaiKMSnqIwokH3UM2nXq1TmbPFSG47yL8kNuYOihOb6yXumCi8wSUw$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/BJSLV6TGQX5BXY5HRKEB2ELZNSYHDAVCNFSM6AAAAABKVP7CW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGI3TMMBZHE__;!!IKRxdwAv5BmarQ!fSReOk13lZHivDzPb1QkocCsO1c4L6YjP1ggW_xlWFaiKMSnqIwokH3UM2nXq1TmbPFSG47yL8kNuYOihOb6yXumpBQZ04E$. You are receiving this because you authored the thread.Message ID: @.***>