noborus / trdsql

CLI tool that can execute SQL queries on CSV, LTSV, JSON, YAML and TBLN. Can output to various formats.
https://noborus.github.io/trdsql/
MIT License
1.94k stars 73 forks source link

Changed handling of NULL #189

Closed noborus closed 2 years ago

noborus commented 2 years ago

Changed JSON null to SQL null. Enabled to specify a character string that is NULL in the input format (-inull). Enabled to specify SQL NULL in the output format string (-onull). If -inull is specified in JSON, the specified character string will be NULL in addition to null. Formats that do not support null will default to an empty string ("").

Resolve #182.

noborus commented 2 years ago

Now that you can handle NULL, https://www.brimdata.io/blog/introducing-zq/ The above issues can also be aggregated using the SQL of trdsql.

zq example:

zq -j openlibrary.json | jq '[.docs[] |
 {title,author_name: .author_name[0], publish_year: .publish_year[0]} | 
select(.author_name!=null and .publish_year!=null)] | group_by(.author_name)| 
[.[] | {author_name: .[0].author_name, count: . |
 length}] | sort_by(.count) | reverse | limit(3;.[])'

trdsql example:

trdsql -ojsonl 
"SELECT json(author_name)->>0 AS author_name, count(*) AS count 
FROM openlibrary.json::.docs
WHERE author_name IS NOT NULL AND publish_year IS NOT NULL 
GROUP BY json(author_name)->>0 
ORDER BY count DESC LIMIT 3"

Unidode normalize is not yet in sqlite3. But PostgreSQL has it.

trdsql  -ojsonl -driver postges -dsn "dbname=trdsql_db"
"SELECT json(normalize(author_name))->>0 AS author_name, count(*) AS count 
FROM openlibrary.json::.docs 
WHERE author_name IS NOT NULL AND publish_year IS NOT NULL 
GROUP BY json(normalize(author_name))->>0
ORDER BY count DESC LIMIT 3"
{"author_name":"S. Stepniak","count":38}
{"author_name":"Władysław Stępniak","count":11}
{"author_name":"Andrzej Stępniak","count":4}