nitefood / ai-bash-gpt

AI - a commandline ChatGPT (3.5/4) client featuring multiple conversations support, automatic topic identification, image generation and stdin piping (sending files to ChatGPT for inspection)
44 stars 6 forks source link

parse json error #4

Open cdhigh opened 1 year ago

cdhigh commented 1 year ago

Is there a way to perform pre-escaping of characters when using non-english (portugues etc), which contains some non-ASCII characters like ç, ã, õ, etc.? Sometimes, JSON parsing errors occur that prevent the display of ChatGPT's responses, for example: jq: parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 13, column 93.

Is there any method for pre-escaping?

edited: If there are special characters in the text that needs to be sent, this error can also occur.

cdhigh commented 1 year ago

I found some unicode strings in conversations.json. conversations.json.zip

Is it because the file was saved using Unicode encoding, and the decoding failed due to the lack of the original encoding during reading?

nitefood commented 1 year ago

I suspect the unicode encoding in the conversations.json file you attached are more likely due to keystrokes you (maybe inadvertently) sent while interacting with the script, rather than non-printable characters being part of the conversation. I used your json and could playback the chats just fine, can't reproduce the jq error you got in the first post. I also tried starting a new conversation using the characters you mentioned specifically, without any issues. Can you please tell me the steps to reproduce this jq error?

nitefood commented 1 year ago

Here's the test I did:

image

cdhigh commented 1 year ago

Enter "como dancar lindo" to reproduce the issue.

jq

Or can you remove all characters less than 0x1f from the user-entered string?

nitefood commented 1 year ago

Sorry, I can't reproduce. Just copy/pasted the sentece you suggested several times and got no errors. May this have to do with the Kindle underlying OS? Maybe a localization issue? Have you tried defaulting to a UTF-8 locale? I have my default locale set on en_US.UTF-8 if that may help.

image

image

image

nitefood commented 1 year ago

Or can you remove all characters less than 0x1f from the user-entered string?

that should not be necessary since the script already "stringifies" the input using jq -Rs here - that include handling special characters whose value is less than 0x1f, e.g. 0xA for newline (LF)

cdhigh commented 1 year ago

I tried some tricks by searching on google, still no luck. Are we able to determine which one is causing the error among multiple jq statements in the code?

nitefood commented 1 year ago

you may try to run the script using bash -x ai "como dancar lindo" to see exactly what gets called, which variables get set and all the output.

cdhigh commented 1 year ago

caught this line! https://github.com/nitefood/ai-bash-gpt/blob/22e77b1a0c8c1aab6a2ea7fdcfa0a018b42e62a2/ai#L735C5-L735C61 735 response_text=$(jq -r '.content' <<<"$response_message")

jq2

nitefood commented 1 year ago

you may try placing a hexdump -C <<<"$response_message" before that line to see exactly what control character jq is complaining about, then run the script normally

cdhigh commented 1 year ago

a lot of pages, can hexdump to a file?

I don't know the first thing about bash and shell~~~ I have some experiences in field of C/C++ and python

cdhigh commented 1 year ago

complaining in line 17 col 178.

rsphexdump.txt

cdhigh commented 1 year ago

This is another dump, line 11 col 151.

rsphexdump (1).txt

cdhigh commented 1 year ago

After continuous searching and experimentation, it was discovered that the error was caused by a restriction in the JSON data where line breaks were not allowed. If a line break was present, it needed to be escaped. By modifying the following line of code: https://github.com/nitefood/ai-bash-gpt/blob/22e77b1a0c8c1aab6a2ea7fdcfa0a018b42e62a2/ai#L735C5-L735C61 735 response_text=$(jq -r '.content' <<<"$response_message") to response_text=$(jq -Rnr '[inputs] | join("\\n") | fromjson | .content' <<<"$response_message") The issue was totally resolved!

The code line is come from stackoverflow.

However, I am still don't know of the exact meaning of the modified line of code~~~

PS: I don't have glow installed in Kindle, maybe this is the reason you cannot reproduce the issue.