nkhorman / json_fdw

PostgreSQL extension which implements a Foreign Data Wrapper (FDW) for JSON files.
109 stars 12 forks source link

Support for valid JSON data #3

Open brandondrew opened 10 years ago

brandondrew commented 10 years ago

Let me start by saying thanks! for your work on this foreign data wrapper. It's one of the things that contributes to making Postgres my first choice database. I hope the drawbacks I'm going to bring up do not seem to imply a lack of appreciation, as in fact I'm very appreciative for all you've done.

But the unfortunate shortcoming is that json_fdw doesn't actually support JSON _files_, but instead supports files which contain rows of JSON--such that each row would be valid JSON on its own, but collected together in a single file without commas between them and without [ and ] wrapping the whole thing make up an invalid JSON file.

While it would be even more awesome to support JSON in any valid format (not requiring each row of the file to correspond to a document) it seems that first just allowing the JSON to be valid is a step in the right direction.

There are several reasons this would be better. First, assuming you're using a library or framework, it is easier to produce valid JSON than invalid JSON. It is actually potentially much more work to produce the format that json_fdw requires since you can't simply use some_array_of_objects.to_json (or whatever method your language/library/framework provides) and be done with it.

Secondly, the file produced to be consumed by json_fdw is not reusable by other applications that require (valid) JSON.

Thirdly (and this is sort of the converse side of the second point) if JSON has been created for another application, there's no way that json_fdw will be able to reuse it.

For the purposes of reusing existing JSON it would of course be best to accept any valid JSON, but at least allowing JSON to be valid would enable reusing things created specifically for json_fdw.

brandondrew commented 10 years ago

As an example, here's a small sample from the customer_reviews_nested_1998.json file. Every validator I tried (such as http://jsonlint.com/) confirmed my understanding that this is invalid JSON.

{ "customer_id": "AE22YDHSBFYIP", "review": { "date": "1970-12-30", "rating": 5, "votes": 10, "helpful_votes": 0 }, "product": { "id": "1551803542", "title": "Start and Run a Coffee Bar (Start & Run a)", "sales_rank": 11611, "group": "Book", "category": "Business & Investing", "subcategory": "General", "similar_ids": ["0471136174","0910627312","047112138X","0786883561","0201570483"] } }
{ "customer_id": "AE22YDHSBFYIP", "review": { "date": "1970-12-30", "rating": 5, "votes": 9, "helpful_votes": 0 }, "product": { "id": "1551802538", "title": "Start and Run a Profitable Coffee Bar", "sales_rank": 689262, "group": "Book", "category": "Business & Investing", "subcategory": "General", "similar_ids": ["0471136174","0910627312","047112138X","0786883561","0201570483"] } }
{ "customer_id": "ATVPDKIKX0DER", "review": { "date": "1995-06-19", "rating": 4, "votes": 19, "helpful_votes": 18 }, "product": { "id": "0898624932", "title": "The Power of Maps", "sales_rank": 407473, "group": "Book", "category": "Nonfiction", "subcategory": "Politics", "similar_ids": ["0226534219","0226534170","1931057001","0801870909","157230958X"] } }
{ "customer_id": "AH7OKBE1Z35YA", "review": { "date": "1995-06-23", "rating": 5, "votes": 4, "helpful_votes": 4 }, "product": { "id": "0521469112", "title": "Invention and Evolution", "sales_rank": 755661, "group": "Book", "category": "Science", "subcategory": "General", "similar_ids": ["1591391857"] } }
{ "customer_id": "ATVPDKIKX0DER", "review": { "date": "1995-07-14", "rating": 5, "votes": 0, "helpful_votes": 0 }, "product": { "id": "0679722955", "title": "The Names (Vintage Contemporaries (Paperback))", "sales_rank": 264928, "group": "Book", "category": "Literature & Fiction", "subcategory": "General", "similar_ids": ["0140152741","0679722947","0140156046","0679722939","0679722920"] } }
{ "customer_id": "A102UKC71I5DU8", "review": { "date": "1995-07-18", "rating": 4, "votes": 2, "helpful_votes": 2 }, "product": { "id": "0471114251", "title": "Bitter Winds ", "sales_rank": 154570, "group": "Book", "category": "Biographies & Memoirs", "subcategory": "General", "similar_ids": ["0812963741","081331769X","014010870X","0879611316","0060007761"] } }
{ "customer_id": "A1HPIDTM9SRBLP", "review": { "date": "1995-07-18", "rating": 5, "votes": 0, "helpful_votes": 0 }, "product": { "id": "0517887290", "title": "Fingerprints of the Gods ", "sales_rank": 13481, "group": "Book", "category": "Science", "subcategory": "Astronomy", "similar_ids": ["0517888521","0609804774","0671865412","1400049512","0517884542"] } }
{ "customer_id": "A1HPIDTM9SRBLP", "review": { "date": "1995-07-18", "rating": 5, "votes": 0, "helpful_votes": 0 }, "product": { "id": "1574531093", "title": "Fingerprints of the Gods (Alternative History)", "sales_rank": 410246, "group": "Book", "category": "Books on Tape", "subcategory": "Nonfiction", "similar_ids": ["0517888521","0609804774","0671865412","1400049512","0517884542"] } }
{ "customer_id": "ATVPDKIKX0DER", "review": { "date": "1995-07-18", "rating": 5, "votes": 1, "helpful_votes": 0 }, "product": { "id": "0962344788", "title": "Heavy Light", "sales_rank": 663630, "group": "Book", "category": "Arts & Photography", "subcategory": "Art", "similar_ids": [] } }

If, however, I add brackets and commas, it passes validation:

[
  { "customer_id": "AE22YDHSBFYIP", "review": { "date": "1970-12-30", "rating": 5, "votes": 10, "helpful_votes": 0 }, "product": { "id": "1551803542", "title": "Start and Run a Coffee Bar (Start & Run a)", "sales_rank": 11611, "group": "Book", "category": "Business & Investing", "subcategory": "General", "similar_ids": ["0471136174","0910627312","047112138X","0786883561","0201570483"] } },
  { "customer_id": "AE22YDHSBFYIP", "review": { "date": "1970-12-30", "rating": 5, "votes": 9, "helpful_votes": 0 }, "product": { "id": "1551802538", "title": "Start and Run a Profitable Coffee Bar", "sales_rank": 689262, "group": "Book", "category": "Business & Investing", "subcategory": "General", "similar_ids": ["0471136174","0910627312","047112138X","0786883561","0201570483"] } },
  { "customer_id": "ATVPDKIKX0DER", "review": { "date": "1995-06-19", "rating": 4, "votes": 19, "helpful_votes": 18 }, "product": { "id": "0898624932", "title": "The Power of Maps", "sales_rank": 407473, "group": "Book", "category": "Nonfiction", "subcategory": "Politics", "similar_ids": ["0226534219","0226534170","1931057001","0801870909","157230958X"] } },
  { "customer_id": "AH7OKBE1Z35YA", "review": { "date": "1995-06-23", "rating": 5, "votes": 4, "helpful_votes": 4 }, "product": { "id": "0521469112", "title": "Invention and Evolution", "sales_rank": 755661, "group": "Book", "category": "Science", "subcategory": "General", "similar_ids": ["1591391857"] } },
  { "customer_id": "ATVPDKIKX0DER", "review": { "date": "1995-07-14", "rating": 5, "votes": 0, "helpful_votes": 0 }, "product": { "id": "0679722955", "title": "The Names (Vintage Contemporaries (Paperback))", "sales_rank": 264928, "group": "Book", "category": "Literature & Fiction", "subcategory": "General", "similar_ids": ["0140152741","0679722947","0140156046","0679722939","0679722920"] } },
  { "customer_id": "A102UKC71I5DU8", "review": { "date": "1995-07-18", "rating": 4, "votes": 2, "helpful_votes": 2 }, "product": { "id": "0471114251", "title": "Bitter Winds ", "sales_rank": 154570, "group": "Book", "category": "Biographies & Memoirs", "subcategory": "General", "similar_ids": ["0812963741","081331769X","014010870X","0879611316","0060007761"] } },
  { "customer_id": "A1HPIDTM9SRBLP", "review": { "date": "1995-07-18", "rating": 5, "votes": 0, "helpful_votes": 0 }, "product": { "id": "0517887290", "title": "Fingerprints of the Gods ", "sales_rank": 13481, "group": "Book", "category": "Science", "subcategory": "Astronomy", "similar_ids": ["0517888521","0609804774","0671865412","1400049512","0517884542"] } },
  { "customer_id": "A1HPIDTM9SRBLP", "review": { "date": "1995-07-18", "rating": 5, "votes": 0, "helpful_votes": 0 }, "product": { "id": "1574531093", "title": "Fingerprints of the Gods (Alternative History)", "sales_rank": 410246, "group": "Book", "category": "Books on Tape", "subcategory": "Nonfiction", "similar_ids": ["0517888521","0609804774","0671865412","1400049512","0517884542"] } },
  { "customer_id": "ATVPDKIKX0DER", "review": { "date": "1995-07-18", "rating": 5, "votes": 1, "helpful_votes": 0 }, "product": { "id": "0962344788", "title": "Heavy Light", "sales_rank": 663630, "group": "Book", "category": "Arts & Photography", "subcategory": "Art", "similar_ids": [] } }
]
pykello commented 10 years ago

Thanks for bringing this up! This is a very interesting discussion.

Our decisions was based on the following points:

I am really interested in your use cases. Can you tell us about what are you planning to use json_fdw for?

metdos commented 8 years ago

Hey @brandondrew,

This project is retired and currently not recommended as a base for new consumers. If you're looking to query semi-structured data, please check the new JSON and JSONB data types built into PostgreSQL.