stripe-archive / mosql

MongoDB → PostgreSQL streaming replication
MIT License
1.62k stars 224 forks source link

How to deal with embedded documents? #4

Closed gferon closed 11 years ago

gferon commented 11 years ago

I tried using the "dot notation" in the YAML definition file and it doesn't seem to work, is it possible right now?

nelhage commented 11 years ago

We don't support pulling out individual fields in embedded documents, yet, unfortunately, although it is on the TODO list.

andrewjshults commented 11 years ago

Based on some offline discussions with @nelhage I'm working on dot syntax support over here: https://github.com/andrewjshults/mosql/tree/dot-syntax

Note that I did change the collections YAML format to allow for a cleaner mapping once the dot syntax component is in place (e.g., name columns baz instead of foo.baz), so your current collections.yml isn't going to be compatible. The changes are fairly straightforward to make though (the README has an updated example).

nelhage commented 11 years ago

@andrewjshults My preference would be to support the existing format, and just upgrade it internally on load -- I think it should be unambiguous which format we're looking at. Happy to review further once you have something a little further along.

andrewjshults commented 11 years ago

@nelhage makes sense - I was actually thinking last night that the two formats are sufficiently distinct that it would be nice to support the existing format as a compressed version of the syntax. I think I'm going to switch around the source/destination in the YAML format of the new format a little bit so that it's more in line with the current format.

Existing

- (source & destination): (type)

Current "new"

- (destination):
  :source: (source)
  :type: (type)

New new

- (source):
  :destination: (destination)
  :type: (type)

To me, this seems like it'd be a bit more unified (the new format is basically just adding additional parameters), but let me know your thoughts. I got a really basic version working last night, but it definitely needs some additional handling support for the different mongo types + unit tests.

nelhage commented 11 years ago

(sorry for the delay, been busy)

Hm. I think the one labelled as "current new" makes more sense to me. As your example shows, either one can be viewed as an extension of the current format, since the current format overloads the key to mean both source and destination. I think I like using the destination as the key, since it feels like you're more-clearly describing how to build up a SQL table from the Mongo collection, rather than vice-versa.

It also allows for more extensibility going forward -- you could support mapping the same mongo key to multiple places in SQL, or you could support :source: [some expression syntax], to support doing more-complex transformations at import time. Both feel more natural in the "current new" syntax.

nelhage commented 11 years ago

Hi,

I just pushed MoSQL 0.2.0, which supports accessing elements inside sub-documents using dot notation. Please give it a try and let me know if works for you.