koaning / bulk

A Simple Bulk Labelling Tool
MIT License
550 stars 46 forks source link

Load json and jsonl files #32

Closed rsbohn closed 1 year ago

rsbohn commented 2 years ago

Finding .csv files a bit limiting.

def read_any(file:Path) -> DataTable:
    if file.name[-5:] == "jsonl":
        return pd.read_json(file, lines=True)
    if file.name[-4:] == "json":
        return pd.read_json(file)
    if file.name[-3:] == "csv":
        return pd.read_csv(file)
    raise ValueError(f"Can't read {file}.")
koaning commented 2 years ago

I wouldn't mind adding .jsonl but is .json really a format people use for something that is columnar?

rsbohn commented 2 years ago

I've been using datasette and sqlite-utils where the default format is .json. You can get .jsonl by adding '--nl'.

https://sqlite-utils.datasette.io/en/stable/cli.html#returning-json

koaning commented 1 year ago

I'm adding support for jsonl in https://github.com/koaning/bulk/pull/45.

For now I'll consider json out of scope for this project. Partially because .jsonl feel superior, but also because it simplifies things.