Closed EasternCaveMan closed 8 months ago
Hey @EasternCaveMan, DataSAIL assigns data points within the epsilon error margin to the split. That is a hard constraint. In the solution space within these hard constraints, DataSAIL optimizes splits based on the weighting of data points and similarities between data points. Depending on the dataset it may happen that the test set only contains datapoints with one label.
For now, you can use the --runs
option to create multiple splits and check for one where samples from both classes are present in the test set. From version 1.0.0 on you can specify stratification to balance classes in each split.If you install DataSAIL from source (branch dev_1.0) you can already use a beta-version of it (it's not fully tested and documented yet).
Best, Roman
Dear Roman, I split my data by I1f method, which makes the test set size=0.1
I got this error during the plotting AUC curve
when I set the
--epsilon 0.0
the test set size =0.2 and, I didnt get this error during plotting ROC AUC curve.