hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.78k stars 4.34k forks source link

[BUG]: prompt jsonl file read ERROR #3658

Closed ifromeast closed 1 year ago

ifromeast commented 1 year ago

🐛 Describe the bug

I have just test

from coati.dataset.utils import jload
jdict = jload("./seed_prompts_en.jsonl")

and it will report ERROR that

JSONDecodeError                           Traceback (most recent call last)
Cell In[5], line 2
      1 from coati.dataset.utils import jload
----> 2 jdict = jload("[./seed_prompts_en.jsonl](https://vscode-remote+ssh-002dremote-002b10-002e0-002e79-002e70.vscode-resource.vscode-cdn.net/root/alpaca_test/TeachBot/rlhf/dataset/seed_prompts_en.jsonl)")

File /usr/local/lib/python3.8/dist-packages/coati/dataset/utils.py:20, in jload(f, mode)
     18 """Load a .json file into a dictionary."""
     19 f = _make_r_io_base(f, mode)
---> 20 jdict = json.load(f)
     21 f.close()
     22 return jdict

File /usr/lib/python3.8/json/__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
    275         parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    276     """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    277     a JSON document) to a Python object.
    278 
   (...)
    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
...
    339 if end != len(s):
--> 340     raise JSONDecodeError("Extra data", s, end)
    341 return obj

JSONDecodeError: Extra data: line 2 column 1 (char 462)

Is there something wrong with the file or the utils.py?

Environment

No response

ifromeast commented 1 year ago

@JThh Can you help to have a look at this issue?

JThh commented 1 year ago

It is likely due to vscode. Can you try using command line to read the file?

ifromeast commented 1 year ago

jdict = jload("./seed_prompts_en.jsonl")

@JThh It is no bussiness of VScode, just a bug

>>> from coati.dataset.utils import jload
>>> jdict = jload("./seed_prompts_en.jsonl")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/coati/dataset/utils.py", line 20, in jload
    jdict = json.load(f)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 462)
JThh commented 1 year ago

Aight I will try myself then

zhixingheyier commented 1 year ago

you can follow the guide below to figure out this problem: image