simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.02k stars 222 forks source link

UTF8 Surrogates Not Allowed #546

Open rsbohn opened 1 month ago

rsbohn commented 1 month ago

Something in the text returned from GPT 4o can't be logged to the database.

File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 3310, in insert_all self.insert_chunk( File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 3068, in insert_chunk result = self.db.execute(query, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 524, in execute return self.conn.execute(sql, parameters) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'utf-8' codec can't encode character '\udc81' in position 14511: surrogates not allowed

Work around: Disable logs and run the prompt again.

PS> cat .\transcript.csv | llm -m 4o -s "Extract each place name."
AlexanderYastrebov commented 2 weeks ago

Would be nice to have a small reproducer file.