zestai / zrp

Zest Race Predictor
Apache License 2.0
28 stars 3 forks source link

Prepare Geo Coded DataFrame memory reduction #36

Closed gmwzest closed 6 days ago

gmwzest commented 3 weeks ago

This modifies the Geo Coding prepare outputs to reduce the memory footprint. Columns were being stored with string objects that require vastly more memory to store than the useful information they actually contain in bytes. By converting to floating point and category types the memory footprint is reduced considerably. This also adds early stopping and custom modeling parameters for the xgboost model builds.