shyiko / mysql-binlog-connector-java

MySQL Binary Log connector
2.22k stars 814 forks source link

Support `JSON` type added in MySQL 5.7 #115

Closed rhauch closed 8 years ago

rhauch commented 8 years ago

Modify the binlog connector to know about the JSON type in MySQL 5.7. This blog post provides a basic introduction and includes links to the relevant worklog #8132 with more details about the high-level and low-level design.

Essentially, JSON data type is a specialization of the BLOB type, where each JSON value is written in a binary format that can be easily and efficiently read and manipulated by MySQL functions.

The minimum desired set of functionality is to add JSON(245) to the ColumnType enumeration, and then to enhance the AbstractRowsEventDataDeserializer class to be aware of this new type and to call a new deserializeJson method that does exactly the same thing as a the existing deserializeBlob method. This means that clients can get the binary form of JSON values, but have to decode the binary representation themselves.

Alternatively, the deserializeJson could be written to decode the binary representation and return an in-memory object representation of the JSON values. However, the immediate followup question is: what form this in-memory object representation should take? Some possibilities:

  1. Return a string representation of the JSON value, but this could be pretty inefficient if the client wanted a different in-memory representation (e.g., a Jackson JsonNode).
  2. Return a JSON object representation, perhaps using the ubiquitous Jackson, although this would add a dependency that might conflict with other dependencies of clients.
  3. Create a utility class for decoding the binary representation into any custom format via a simple builder mechanism, enabling the client to use this when necessary.

One wrinkle is that MySQL allows the JSON values to be JSON objects, JSON arrays, JSON literal values (e.g., null, true, or false), and int16, uint16, int32, uint32, int64, uint64, double, a UTF-8 encoded string, and other custom types (any MySQL data type).

I plan to submit a PR for the minimal support, but I'm interested in thoughts about whether the "enhanced" support can/should be provided by this project.

rhauch commented 8 years ago

Consider a table defined as follows:

CREATE TABLE json_example_table (
  col JSON
);

When the binlog connector is used to read the binlog of this database, an NPE exists at this line since ColumnType.byCode(int) returns null for an input value of 245, which is MySQL's data type constant for the JSON data type.

Also, from what I can tell from MySQL's row_event.h file, neither BLOB nor JSON values are listed as having field metadata associated with the columns in TABLE_MAP_EVENT. Therefore, the TableMapEventDataDeserializer.java class doesn't need to be changed.

shyiko commented 8 years ago

Hi Randall.

I feel like byte[] is the way to go here. Once https://github.com/shyiko/mysql-binlog-connector-java/issues/107 is resolved we'll have your "3rd possibility" achieved for any data type. Choice between JSON deserialization libraries varies greatly on the context (e.g. user might wanna use GSON instead of Jackson if size of the objects is expected to be small (last time I checked GSON was outperforming Jackson on small inputs) or even try something like https://github.com/doubledutch/LazyJSON when only part of the object is needed) and so (at this point) I personally think we should keep this decision out of mysql-binlog-connector-java (in addition to all the cons that you have already mentioned).

an NPE exists at this line since ColumnType.byCode(int)

That's gonna have to be rectified. Thank you pointing it out.

Looking forward to the PR :)

rhauch commented 8 years ago

@shyiko, actually, why can't we return the string representation of the JSON document? Yes, clients would have to parse that string if/when they need another form, but it's far better than expecting them to parse the MySQL-specific internal binary representation.

Then as part of #107 this could be changed/enhanced to allow clients to get the string form or use custom logic as this library parses the binary JSON value form. (Of course, what this API might look like depends completely on the approach taken to resolve #107.)

shyiko commented 8 years ago

@rhauch You're right. I was under the impression that byte[] represented the string but after reading worklog #8132 I see that it's not the case. String it is then.