TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Problem
OpWorkflowModelWriter outputs pretty printed large jsons. Pretty printing for disk is an unnecessary feature because there are a lot of tools that can pretty-print on-demand when a human user inspects the file.
illustration
format
size in bytes
pretty (original)
1698606
compact json
963400
gzip pretty
203380
gzip compact
185558
Solution
Output compact json. (Longer term allow for efficient binary formats). Gzip-compress on top of this.
Problem OpWorkflowModelWriter outputs pretty printed large jsons. Pretty printing for disk is an unnecessary feature because there are a lot of tools that can pretty-print on-demand when a human user inspects the file.
illustration
Solution Output compact json. (Longer term allow for efficient binary formats). Gzip-compress on top of this.
Alternatives making it configurable