Parallelize and speed up Tabular dataset loading & feature set construction (tabular)

This puts in place the basics to improve MLDB's dataset loading and ML setup operations:

ContentDescriptors, so that we can refer to a dataset in a way that allows us to cache and share intermediate results
Block-aware compression (currently for lz4 only), which allows for compressed datasets to be chunked across multiple threads (zstd should be possible too but is not implemented yet)
The use of large, contiguous and file backable memory blocks behind the temporary datasets created, allowing for larger-than-core operation when backed by a suitable SSD or other secondary storage
Parallelized feature analysis, bucketing and packing into optimized data structures, reducing the memory usage and memory bandwidth requirements of the setup phases for classic ML algorithms
Implementation of better column analysis on Tabular dataset loading, so that less work needs to be done in the setup phase.

It allows the airlines CSV dataset to be loaded at around 2.3 million rows per second (230MB/second) on an M1 Mac Mini, and at over 10 million rows per second on a server class machine (in particular, multicore scaling is significantly better than before).

mldbai / mldb

Parallelize and speed up Tabular dataset loading & feature set construction (tabular) #950