Closed EgorBu closed 5 years ago
@vmarkovtsev can you tell how to generate this dataset? I think it is important to add a note here. And do we fix commit hash to the one where typo was introduced?
It's not in this dataset definitely. It has only 1 commit hash
@zurk Generating this dataset was a bloody hell which I am ashamed to even mention here :D I will upload some scripts and docs to research
once I have time
Commit hashes should be added by somebody, no resources currently.
@EgorBu can I ask you add comment hashes where typo was introduced? It should be possible with git blame.
There is a problem though: the line which is saved in the dataset is a line in the new commit.
However, due to the lucky bug, I think it should match the line ion the old commit. We should check it.
@EgorBu CI must pass, most likely we need another exclusion for research
======================================================================
ERROR: test_train_from_scratch (lookout.style.typos.tests.test_preparation.TrainingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/style-analyzer/lookout/style/typos/tests/test_preparation.py", line 132, in test_train_from_scratch
model = train_from_scratch(config)
File "/style-analyzer/lookout/style/typos/preparation.py", line 268, in train_from_scratch
prepared_data = prepare_data(config["preparation"])
File "/style-analyzer/lookout/style/typos/preparation.py", line 82, in prepare_data
data = pandas.read_csv(raw_data_path, index_col=0, keep_default_na=False)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 818, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1695, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 760, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2197, in pandas._libs.parsers.raise_parser_error
File "/usr/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.6/_compression.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
_lzma.LZMAError: Input format not supported by decoder
it failed only at one python version. And passed everything else :suspect:
@EgorBu I had the same problem. Rerun help me to solve it: https://github.com/src-d/style-analyzer/pull/721#issuecomment-477525516