toddfarmer / arrow-migration

0 stars 1 forks source link

[Java] Implement/test dictionary-encoded subfields #1158

Closed toddfarmer closed 5 years ago

toddfarmer commented 7 years ago

Note: This issue was originally created as ARROW-1175. Please see the migration documentation for further details.

Original Issue Description:

We do not have any tests about types like:

List<Dictionary-encoded String>

cc [~julienledem] [~elahrvivaz]

toddfarmer commented 7 years ago

Note: Comment by Emilio Lahr-Vivaz (elahrvivaz): Note: I don't think it's directly supported by the Writer API, so you have to do it slightly manually. Here is our code for writing dictionary values encoded children in a map vector (FYI there is some non-relevant bits that involve our integration with geotools, and it's in scala):

https://github.com/locationtech/geomesa/blob/master/geomesa-arrow/geomesa-arrow-gt/src/main/scala/org/locationtech/geomesa/arrow/vector/ArrowAttributeWriter.scala#L189-L191 https://github.com/locationtech/geomesa/blob/master/geomesa-arrow/geomesa-arrow-gt/src/main/scala/org/locationtech/geomesa/arrow/vector/ArrowAttributeWriter.scala#L224-L225

https://github.com/locationtech/geomesa/blob/master/geomesa-arrow/geomesa-arrow-gt/src/main/scala/org/locationtech/geomesa/arrow/vector/ArrowAttributeReader.scala#L148 https://github.com/locationtech/geomesa/blob/master/geomesa-arrow/geomesa-arrow-gt/src/main/scala/org/locationtech/geomesa/arrow/vector/ArrowAttributeReader.scala#L164-L167

toddfarmer commented 7 years ago

Note: Comment by Emilio Lahr-Vivaz (elahrvivaz): Some more details - the important bit is calling 'vector.addOrGet' with the correct dictionary encoded field. This sets up the metadata correctly. The child vector is of the dictionary encoded type (e.g. Int), and you have to manually encode dictionary values before writing them. On read, you have to examine the schema so that you know to manually decode the Int values appropriately.

toddfarmer commented 7 years ago

Note: Comment by Wes McKinney (wesm): Cool, after we get the integration tests working for flat data I will add integration tests for encoded subfields and we can address any issues that come up

toddfarmer commented 7 years ago

Note: Comment by Wes McKinney (wesm): Moving this off 0.6.0 as not urgent. It would be good to have unit tests for this, though. cc [~icexelloss]

toddfarmer commented 5 years ago

Note: Comment by Wes McKinney (wesm): [~emkornfield@gmail.com] I might put this on your radar for some point in the future, moving out of 0.14.0 for now

toddfarmer commented 5 years ago

Note: Comment by Ji Liu (tianchen92): If [~emkornfield@gmail.com] don't mind, I would like to take this issue:).

toddfarmer commented 5 years ago

Note: Comment by Micah Kornfield (emkornfield@gmail.com): SGTM, in general, I think if issues are unassigned they are up for anyone to take on.