Open Kastanek opened 1 year ago
We have trained XGBoost models in Tribuo with hundreds of thousands of records, though we used a fairly large machine to do so. Batch loading from the SQL DB isn't the relevant part, as Tribuo requires all the data be in memory before it can train a model.
Ask the question Is training a model using SQLDataSource suitable for large datasets that do not fit in RAM? I expect my dataset to grow to hundreds of thousands of records. I see that batching is performed, but I'm not sure whether a model can be trained this way. I'm particularly interested in training with XGBoostRegressionTrainer. Is your question about a specific Tribuo class? SQLDataSource