Multiprocessing was used to transform any timeseries dataframe, including those consisting of a singular rows, this added some slight overhead in case like single-valued predictions or small batch predictions, but this was negligible
However, when the mindsdb_native predict/learn methods are called inside certain types of python processes, such as spawned processes (like those used by mindsdb to run it's APIs, due to corss-platform compatibility requirements) the multiprocess module has a quirk in it's behavior, in that it would lead to the whole of mindsdb_native being re-imported. This caused issues both because of memory consumption (each process would occupy 2.5GB+ by virtue of re-importing every single mindsdb_native dependency, rather than just those required to run it's respective function) and in terms of time (importing mindsdb_native takes ~3 seconds)
How
Timeseries transformations for dataframes with < 500 rows will now be done in the main python process, without the use of multiprocessing.
Why
spawn
ed processes (like those used by mindsdb to run it's APIs, due to corss-platform compatibility requirements) the multiprocess module has a quirk in it's behavior, in that it would lead to the whole of mindsdb_native being re-imported. This caused issues both because of memory consumption (each process would occupy 2.5GB+ by virtue of re-importing every single mindsdb_native dependency, rather than just those required to run it's respective function) and in terms of time (importing mindsdb_native takes ~3 seconds)How