mldbai / mldb

MLDB is the Machine Learning Database
http://mldb.ai
Apache License 2.0
661 stars 102 forks source link

Parallelize and speed up Tabular dataset loading & feature set construction (tabular) #950

Closed jeremybarnes closed 2 years ago

jeremybarnes commented 2 years ago

This puts in place the basics to improve MLDB's dataset loading and ML setup operations:

It allows the airlines CSV dataset to be loaded at around 2.3 million rows per second (230MB/second) on an M1 Mac Mini, and at over 10 million rows per second on a server class machine (in particular, multicore scaling is significantly better than before).