simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.67k stars 111 forks source link

rows_from_file() raises confusing error if file-like object is not in binary mode #520

Closed simonw closed 1 year ago

simonw commented 1 year ago

I got this error:

  File "/Users/simon/Dropbox/Development/openai-to-sqlite/openai_to_sqlite/cli.py", line 27, in embeddings
    rows, _ = rows_from_file(input)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/.local/share/virtualenvs/openai-to-sqlite-jt4obeb2/lib/python3.11/site-packages/sqlite_utils/utils.py", line 305, in rows_from_file
    first_bytes = buffered.peek(2048).strip()
                  ^^^^^^^^^^^^^^^^^^^

From this code:


@cli.command()
@click.argument(
    "db_path",
    type=click.Path(file_okay=True, dir_okay=False, allow_dash=False),
)
@click.option(
    "-i",
    "--input",
    type=click.File("r"),
    default="-",
)
def embeddings(db_path, input):
    "Store embeddings for one or more text documents"
    click.echo("Here is some output")
    db = sqlite_utils.Database(db_path)
    rows, _ = rows_from_file(input)
    print(list(rows))

The error went away when I changed it to type=click.File("rb").

This should either be called out in the documentation or rows_from_file() should be fixed to handle text-mode files in addition to binary files.

simonw commented 1 year ago

The documentation here does at least say the following: https://sqlite-utils.datasette.io/en/3.30/python-api.html#reading-rows-from-a-file

  • fp (BinaryIO) -- a file-like object containing binary data
mcarpenter commented 1 year ago

Hey, isn't this essentially the same issue as #448 ?

simonw commented 1 year ago

Hey, isn't this essentially the same issue as #448 ?

Yes it is, good catch!