yoeo / guesslangtools

Tool to build a training dataset for Guesslang, the programming language guesser
MIT License
22 stars 7 forks source link

Error Tokenizing Data #9

Open zzj0402 opened 1 year ago

zzj0402 commented 1 year ago
07_downloaded_repositories.csv
10:27:29 INFO: List source files from repositories
10:27:29 INFO: This operation might take several minutes...
Traceback (most recent call last):
  File "/home/zing/anaconda3/envs/guesslang/bin/gltool", line 8, in <module>
    sys.exit(main())
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/guesslangtools/__main__.py", line 153, in main
    run_workflow(config)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/guesslangtools/app.py", line 17, in run_workflow
    source_files.list_all(config)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/guesslangtools/common.py", line 175, in wrapped
    result = func(config, *args, **kw)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/guesslangtools/workflow/source_files.py", line 84, in list_all
    repo = config.load_csv(File.DOWNLOADED_REPOSITORIES)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/guesslangtools/common.py", line 108, in load_csv
    return pd.read_csv(fullname)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 678, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read
    return parser.read(nrows)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1253, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/zing/anaconda3/envs/guesslang/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 225, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 880, saw 4