shuttle-hq / synth

The Declarative Data Generator
https://www.getsynth.com/
Apache License 2.0
1.36k stars 104 forks source link

Lossless Sampling #16

Open christos-h opened 3 years ago

christos-h commented 3 years ago

Required Functionality

Currently the XExportStrategy and Sampler::sample functions work with vectors of JSON values.

This is handy, but it is loses information.

In fact for any data sink which has types which are a superset of the JSON data model (both Postgres and Mongo) you will be losing information as most types get serialized to a string (for example timestamps).

This can be a problem, as at insertion time you don't know what type to use on the client library.

Proposed Solution

  1. It feels like export strategy is doing too much + too tightly coupled to the sampler
  2. The sampler returns JSON values, which is not ideal. We want the sampler to return a vector of Value types in core::graph.
  3. The type mapping has to be redone. We currently have JSON -> PG types & JSON -> Mongo. This needs to be re-implemented for core::graph::Value -> X.
llogiq commented 3 years ago

While we're at it, we should probably look into #28 regarding escaping (at least for Postgres).