Closed sbeaupre closed 4 years ago
I want more information.
Does trdsql exit without outputting anything (errors, etc.) in JSON? Can you output CSV and JSON with SQL LIMIT?
CSV output works fine, with or without LIMIT.
JSON output works with a LIMIT (eg. 10), but not without LIMIT
JSON without LIMIT gives no output at all, not even error messages. Tried also with debug flag, but no extra info is returned.
Hmm ... I actually downloaded and tried it. The result seems to have been output successfully.
docker run --rm -it -v $(pwd):/tmp noborus/trdsql -ojson -ih " \
SELECT * FROM /tmp/enterprise.csv as e \
LEFT JOIN /tmp/address.csv as a ON (e.EnterpriseNumber=a.EntityNumber) \
LEFT JOIN /tmp/contact.csv as c ON (e.EnterpriseNumber=c.EntityNumber) \
LEFT JOIN /tmp/denomination.csv as d ON (e.EnterpriseNumber=d.EntityNumber) \
" >tmp.json
ls -alFh tmp.json
-rw-r--r-- 1 noborus noborus 1.4G Nov 25 13:48 tmp.json
Saw that you are running on linux and realized that it is a memory issue.
Pure from an install on Mac, it works and more than 12GB is consumed to produce the json. Running inside docker with more resources works also fine. Strange that docker gave no error message.
Looking at the code (not a go dev myself), I suppose you are building everything in memory first and then write it to disk, and not streaming the data?
Saw that you are running on linux and realized that it is a memory issue.
Pure from an install on Mac, it works and more than 12GB is consumed to produce the json. Running inside docker with more resources works also fine. Strange that docker gave no error message.
I see. understood.
Looking at the code (not a go dev myself), I suppose you are building everything in memory first and then write it to disk, and not streaming the data?
Yes. Since it is currently an array, it is temporarily stored in memory.
For example
[{"id":"1","name":"Orange"},
{"id":"2","name":"Melon"},
{"id":"3","name":"Apple"}]
For large outputs, you will need an output that is not an array (I don't know what to call this).
{"id":"1","name":"Orange"}
{"id":"2","name":"Melon"}
{"id":"3","name":"Apple"}
Not yet. I will consider it from now on.
I added a new JSONL output option ( #103 ). This format can reduce memory consumption.
I am working on a medium-large open data set of Belgian companies, which is freely downloadable at https://kbopub.economie.fgov.be/kbo-open-data/login
I am trying to execute following query to denormalize data into 1 file and it works fine when the output target is CSV, but nothing happens when outputting JSON:
Any ideas?