Support UMAP/t-SNE of large datsets

We need to drastically change the way how datasets are projected. The main issue is loading the entire dataframe. How do we get around that?
- Enable chunked processing of columns to determine normalization functions (i.e. min/max, categories, ...). These are returned as closured variables in featurizers.
- If we determine the dataset is too large, we precompute incremental PCA by using chunked parts of the data
- If that already yields only 2 components, we return that as our coordinates
- Then, we compute the actual projections on the PCA components/features
- If gowers distance is used, we only support datasets with max 10k rows, as the distance matrix explodes quadratically.
- If the dataset fits in memory, we use the actual projection
This basically reduces memory usage by order of magnitudes for large datasets (i.e. 10GB+ to 1-2GB) for 300k rows.

Result of UMAP of dataset with 315k rows on feature sulfonyl (157 features). Took around 15min.

jku-vds-lab / reaction-cime