techascent / tech.ml.dataset

A Clojure high performance data processing system
Eclipse Public License 1.0
680 stars 35 forks source link

Default to eliding missing values in JSON representation #363

Closed cnuernber closed 1 year ago

cnuernber commented 1 year ago

Currently they can end up as null or NAN or a few different things so it would be ideal to just elide the key. I currently believe that this behavior should be optional but defaulted on.

Concurrently ds/rows should have an option to elide missing keys - this should not be the default as there is a per-record perf cost inherent in this pathway but the option should be there and be clearly documented.

genmeblog commented 1 year ago

I want to share, that eliding missing values in mapseq-reader by default was really breaking change. It wasn't visible at the beginning. Since then several users of tablecloth report strange bugs which are caused only by this change in different places like pivoting, aggregation, grouping etc...

It's just a lesson that such kind of minor update can cause major breakage.

cnuernber commented 1 year ago

Thanks for the clear communication - I definitely did not intend such breakage. Glad things are now resolved.

genmeblog commented 1 year ago

I definitely should work more on tests to find such things earlier. Anyway, it looks like the transition to TMD7 was quite smooth which was really great result after such big refactoring.