toddfarmer / arrow-migration

0 stars 1 forks source link

[C++] Build JSON "scanner" for reading record batches from line-delimited JSON files #2643

Closed toddfarmer closed 5 years ago

toddfarmer commented 7 years ago

Note: This issue was originally created as ARROW-694. Please see the migration documentation for further details.

Original Issue Description:

Umbrella issue for using RapidJSON to parse JSON files to Arrow record batches

toddfarmer commented 5 years ago

Note: Comment by Wes McKinney (wesm): Seems we could use the RapidJSON SAX API for parsing files with 1 JSON object per line: http://rapidjson.org/md_doc_sax.html

toddfarmer commented 5 years ago

Note: Comment by Wes McKinney (wesm): I just wrote up a Google doc about this with some links to JSON support in other analytics tools

toddfarmer commented 5 years ago

Note: Comment by Neville Dipale (nevi_me): Maybe this isn't related to this JIRA, but JSON schema (https://json-schema.org/) seems to be growing in adoption. We might consider supporting inferring/creating record schemas from a supplied JSON schema in future.

toddfarmer commented 5 years ago

Note: Comment by Wes McKinney (wesm): Issue resolved by pull request 3592 https://github.com/apache/arrow/pull/3592