zhangnn520 / chinese_llama_alpaca_lora

llama信息抽取实战
Apache License 2.0
97 stars 10 forks source link

解码错误 #4

Open kse-ElEvEn opened 1 year ago

kse-ElEvEn commented 1 year ago

Traceback (most recent call last):
File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 113, in _generate_tables pa_table = paj.read_json( File "pyarrow/_json.pyx", line 258, in pyarrow._json.read_json File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: JSON parse error: Column(/input) was specified twice in row 397

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/site-packages/datasets/builder.py", line 1858, in _prepare_splitsingle for , table in generator: File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 134, in _generate_tables dataset = json.load(f) File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/json/init.py", line 293, in load return loads(fp.read(), File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 31457279-31457280: invalid continuation byte

我的文件都是用utf-8编码和解码,为什么还一直出现这样的问题?

zhangnn520 commented 1 year ago

你的文件有问题格式不对或者存在无法编码字符,就在31457279-31457280的位置


---- 回复的原邮件 ----
发件人 ***@***.***>
日期 2023年07月02日 03:56
收件人 ***@***.***>
抄送至 ***@***.***>
主题 [zhangnn520/chinese_llama_alpaca_lora] 解码错误 (Issue #4)

Traceback (most recent call last):
File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 113, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 258, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Column(/input) was specified twice in row 397

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/site-packages/datasets/builder.py", line 1858, in _prepare_split_single
for _, table in generator:
File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/site-packages/datasets/packaged_modules/json/json.py", line 134, in _generate_tables
dataset = json.load(f)
File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/json/init.py", line 293, in load
return loads(fp.read(),
File "/seu_share/home/qiguilin/230229361/.conda/envs/llama-chat/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 31457279-31457280: invalid continuation byte

我的文件都是用utf-8编码和解码,为什么还一直出现这样的问题?


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <zhangnn520/chinese_llama_alpaca_lora/issues/4@github.com>