Open rickhg12hs opened 5 months ago
Can you post a snippet of the CSV file? It would help us troubleshoot.
Can you post a snippet of the CSV file? It would help us troubleshoot.
Please see code block for credits.csv
above.
You have initial spaces after the delimiters (commas), so you need to use -S
(--skipinitialspace
). Your data also is unusual (what looks like Python dictionaries containing commas), so Python's CSV sniffer is getting confused. Disable it with -y0
(--snifflimit 0
). You then get csvjson -S -y0
, for example:
$ printf "cast,crew,id\n\"[{'cast_id': 14, 'character': 'Woody (voice)', 'credit_id': '52fe4284c3a36847f8024f95', 'gender': 2, 'id': 31, 'name': 'Tom Hanks', 'order': 0, 'profile_path': '/pQFoyx7rp09CJTAb932F2g8Nlho.jpg'}, {'cast_id': 15, 'character': 'Buzz Lightyear (voice)', 'credit_id': '52fe4284c3a36847f8024f99', 'gender': 2, 'id': 12898, 'name': 'Tim Allen', 'order': 1, 'profile_path': '/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg'}]\", \"[{'credit_id': '52fe4284c3a36847f8024f49', 'department': 'Directing', 'gender': 2, 'id': 7879, 'job': 'Director', 'name': 'John Lasseter', 'profile_path': '/7EdqiNbr4FRjIhKHyPPdFfEEEFG.jpg'}, {'credit_id': '52fe4284c3a36847f8024f4f', 'department': 'Writing', 'gender': 2, 'id': 12891, 'job': 'Screenplay', 'name': 'Joss Whedon', 'profile_path': '/dTiVsuaTVTeGmvkhcyJvKp2A5kr.jpg'}, {'credit_id': '52fe4284c3a36847f8024f55', 'department': 'Writing', 'gender': 2, 'id': 7, 'job': 'Screenplay', 'name': 'Andrew Stanton', 'profile_path': '/pvQWsu0qc8JFQhMVJkTHuexUAa1.jpg'}, {'credit_id': '52fe4284c3a36847f8024f5b', 'department': 'Writing', 'gender': 2, 'id': 12892, 'job': 'Screenplay', 'name': 'Joel Cohen', 'profile_path': '/dAubAiZcvKFbboWlj7oXOkZnTSu.jpg'}]\", 862" | csvjson -S -y0
[{"cast": "[{'cast_id': 14, 'character': 'Woody (voice)', 'credit_id': '52fe4284c3a36847f8024f95', 'gender': 2, 'id': 31, 'name': 'Tom Hanks', 'order': 0, 'profile_path': '/pQFoyx7rp09CJTAb932F2g8Nlho.jpg'}, {'cast_id': 15, 'character': 'Buzz Lightyear (voice)', 'credit_id': '52fe4284c3a36847f8024f99', 'gender': 2, 'id': 12898, 'name': 'Tim Allen', 'order': 1, 'profile_path': '/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg'}]", "crew": "[{'credit_id': '52fe4284c3a36847f8024f49', 'department': 'Directing', 'gender': 2, 'id': 7879, 'job': 'Director', 'name': 'John Lasseter', 'profile_path': '/7EdqiNbr4FRjIhKHyPPdFfEEEFG.jpg'}, {'credit_id': '52fe4284c3a36847f8024f4f', 'department': 'Writing', 'gender': 2, 'id': 12891, 'job': 'Screenplay', 'name': 'Joss Whedon', 'profile_path': '/dTiVsuaTVTeGmvkhcyJvKp2A5kr.jpg'}, {'credit_id': '52fe4284c3a36847f8024f55', 'department': 'Writing', 'gender': 2, 'id': 7, 'job': 'Screenplay', 'name': 'Andrew Stanton', 'profile_path': '/pvQWsu0qc8JFQhMVJkTHuexUAa1.jpg'}, {'credit_id': '52fe4284c3a36847f8024f5b', 'department': 'Writing', 'gender': 2, 'id': 12892, 'job': 'Screenplay', 'name': 'Joel Cohen', 'profile_path': '/dAubAiZcvKFbboWlj7oXOkZnTSu.jpg'}]", "id": 862.0}]
@jpmckinney Super! Is there also a way to maintain integers as integers? 862
gets converted to 862.0
.
Sure, use -I
(--no-inference
)
Sure, use
-I
(--no-inference
)
But then 862
becomes "862"
. :-(
Ah, yes. Why do you need it to be an int instead of a float with .0
?
Edit: Agate (on which csvkit relies) stores numbers as decimals. When serializing to JSON, it converts the decimal to a float in the jsonify
method, which Python represents with .0
. The jsonify
method takes no arguments. It's maybe possible to add an argument the the Number
instantiation to force the use of int
instead of float
, and then expose that argument via a csvkit command-line argument.
Why do you need it to be an int instead of a float with
.0
?
For my specific use case, converting (and possibly editing, with e.g., jq
) CSV files for inclusion in MongoDB collections. Maintaining ints as ints saves space and unifies data types throughout the collection.
This is a strange CSV file, but it does seem to have a header line and three columns.
csvjson
seems to be having trouble with the data columns.credits.csv:
I tried various
csvjson
commandline options, but couldn't find the right recipe.