Closed Jhixx24 closed 3 years ago
Hi @Jhixx24! I know Numerai data quite well.
This type of cross-validation request can be added, but hard to say when. Would you like to follow an example from Numerai's forum or you have some unique idea for feature engineering that will work with this type of validation?
Going back to the data itself. Please run it on a 10-fold CV and it will work pretty well (better than the example model provided by Numerai as far as I remember). But you need to run on a decent machine for at least 12 hours. Below is the example code how to use MLJAR AutoML:
train = pd.read_csv("numerai_training_data.csv")
x_cols = [f for f in train.columns if "feature" in f]
y_col = "target"
automl = AutoML(
ml_task="regression",
mode="Compete",
total_time_limit=12 * 60 * 60,
)
automl.fit(train[x_cols], train[y_col])
hi..I am trying to run AutoML but i cannot get it to split the data according to the different "eras" or periods in the dataset. i have tried 5fold cv but it splits and shuffles them without respecting the era. i have added TimeSeriesSplitGroups class from a different script that gets the appropriate eras data but do not know how to implement it in your code. any assistance would be appreciated.
so the current implementation of cv will work ?..because i suspect it will not cut the data at the right places.
OK, got it. But my point is, that you don't need to respect "eras" when doing validation. You can simply shuffle samples from different eras and train AutoML with 5-fold or 10-fold CV. I'm using such approach - MLJAR AutoML is part of my ensemble. My performance metric is below:
I'm not using validation data for the training. The last tip from me, I'm using feature neutralization.
ahh ok thanks..last question..might you have an idea on how to get rid of this error while installing Auto ML ? RuntimeError: Building llvmlite requires LLVM 10.0.x or 9.0.x, got '11.0.1'. Be sure to set LLVM_CONFIG to the right executable path.
I need more information:
Python 3.9 is not yet supported. Please try with Python 3.7
Ok..i will try it. Thank you.
it worked ! 👍
@Jhixx24 Grouped Time Series validation can be applied with custom validation. Closing the issue. Fixed in #380.