Open bigmisspanda opened 2 days ago
Hi @bigmisspanda – thank you for your question!
You are right, all the preprocessing primitives require to be in memory.
One work around can be to replace these primitives with your own scalable functions and then start the Orion pipeline from the modeling primitive directly. Another can be to chunk up your training data and training the pipeline on each chunk.
Description
In my case, the training data is very large and cannot be loaded into memory all at once. It seems that
time_segments_aggregate
,SimpleImputer
,MinMaxScaler
, androlling_window_sequences
in the pipeline all require the data to be stored in memory. Can Orion handle training a 2-10TB dataset?