Closed tacman closed 11 months ago
DOH! This is what I was looking for:
csvstat data.csv --csv | csvjson | jq
[
{
"column_id": 1,
"column_name": "state",
"type": "Text",
"nulls": false,
"unique": 1,
"min": null,
"max": null,
"sum": null,
"mean": null,
"median": null,
"stdev": null,
"len": 2,
"freq": "NE"
},
{
"column_id": 2,
"column_name": "county",
"type": "Text",
Actually, I'm going to reopen this, and ask that you consider adding a --json option that includes the frequency stats.
csvstat data/subtitles_day.tsv
15. "movieKind"
Type of data: Text
Contains null values: False
Unique values: 2
Longest value: 5 characters
Most common values: tv (1443x)
movie (413x)
csvstat data/subtitles_day.tsv | csvjson | jq
{
"column_id": 15,
"column_name": "movieKind",
"type": "Text",
"nulls": false,
"unique": 2,
"min": null,
"max": null,
"sum": null,
"mean": null,
"median": null,
"stdev": null,
"len": 5,
"freq": "tv, movie"
},
{
So we lost the frequency data that exists in the text report. Yeah, it's a pain to add that in CSV, but it'd be natural in JSON, e.g.
{ "freq": { "tv": 1443, "movie": 413 }}
Added
csvstat --json examples/realdata/ks_1033_data.csv -i 2
...
{
"column_id": 14,
"column_name": "federal_supply_class_name",
"type": "Text",
"nulls": false,
"nonnulls": 1575,
"unique": 30,
"len": 63,
"freq": [
{
"value": "Guns, through 30 mm",
"count": 1427
},
{
"value": "Trucks and Truck Tractors, Wheeled",
"count": 33
},
{
"value": "Vehicular Cab, Body, and Frame Structural Components",
"count": 28
},
{
"value": "Surface Use Explosive Ordnance Disposal Tools and Equipment",
"count": 14
},
{
"value": "Medical and Surgical Instruments, Equipment, and Supplies",
"count": 12
}
]
}
]
First, a big thanks for publishing csvkit! I only wish I had discovered it earlier.
I like the --csv option for csvstats, and would love to see a --json (or --format=json|csv|text) option. That would allow piping the results to another script, since importing JSON is much easier and faster than importing csv.
I know there are various scripts out there for csvtojson, but as I'm integrating this toolkit into our workflow I'd like to avoid having another tool to install.
Thanks for your consideration.