Closed liuxiaodong008008 closed 4 years ago
split
returns (dataTrain, dataTest)
. validationPrepare
does data balancing or dropping based on the labels before splitting dataTrain
into trainset and validationset. So, how to split dataTrain
into trainset and validationset?
if you set the seed on the splitter the dataTest returned will be the holdout dataset and the dataTrain will be the training and validation data. Separating out the training and validation data for examination is not really possible since it will depend on the validation method used (eg cross validation or training split) and this is only done internal to the model selector call.
I don't believe we expose those datasets. But you can recreate them by applying
split
andvalidationPrepare
methods onsplitter
/datacutter
instances with training datasets.https://github.com/salesforce/TransmogrifAI/blob/master/core/src/main/scala/com/salesforce/op/stages/impl/tuning/Splitter.scala