salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 393 forks source link

compact and compressed json serialization for models #374

Closed gerashegalov closed 5 years ago

gerashegalov commented 5 years ago

Problem OpWorkflowModelWriter outputs pretty printed large jsons. Pretty printing for disk is an unnecessary feature because there are a lot of tools that can pretty-print on-demand when a human user inspects the file.

illustration

format size in bytes
pretty (original) 1698606
compact json 963400
gzip pretty 203380
gzip compact 185558

Solution Output compact json. (Longer term allow for efficient binary formats). Gzip-compress on top of this.

Alternatives making it configurable