thomjur / PyCollocation

Python module to do simple collocation analysis of a corpus.
GNU General Public License v3.0
0 stars 1 forks source link

Error when Counters are empty #19

Closed trutzig89182 closed 2 years ago

trutzig89182 commented 2 years ago

Hey, I have been fiddling a bit with the twitter/twarc adapter and came across an error when using the analysis.py which seems to stem from the definition of axes in display.py. Given the hour my cognitive capacity is quite limited right now, so I will just drop this here for the moment. I will try to figure out if I just did something wrong or if there is a bug somewhere some time later.

Traceback (most recent call last):
  File "/Users/maxmustermann/Documents/GitHub/PyCollocation_test/analysis.py", line 77, in <module>
    start_collocation_analysis(collection, search_term, int(l_window), int(r_window), statistic, doc_type="folder", output_type = output_type)
  File "/Users/maxmustermann/Documents/GitHub/PyCollocation_test/analysis.py", line 61, in start_collocation_analysis
    display.get_results_collocates(left_counter, right_counter, full_counter, search_term_count, l_window, r_window, statistic, output_type)
  File "/Users/maxmustermann/Documents/GitHub/PyCollocation_test/tools/display.py", line 22, in get_results_collocates
    df_top_collocates.columns = ["collocate", "coll_freq"]
  File "/opt/anaconda3/envs/twarc-venv/lib/python3.9/site-packages/pandas/core/generic.py", line 5491, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/opt/anaconda3/envs/twarc-venv/lib/python3.9/site-packages/pandas/core/generic.py", line 763, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/opt/anaconda3/envs/twarc-venv/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/opt/anaconda3/envs/twarc-venv/lib/python3.9/site-packages/pandas/core/internals/base.py", line 57, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements
thomjur commented 2 years ago

Did figure something out? Sorry, I was sick for a couple of days and need to catch up with my work first, I'll be back soon. What were you exactly trying to do?

trutzig89182 commented 2 years ago

Hey, no problem. I have been busy with preparing a workshop and didn’t have a lot of time either (and won’t have much for another week). I have been trying handing over texts from jsonl files to collocations(), cumulating the results and handing them over to display.py to get_results_collocates() in display.py to print out the results.

After an error occurred I just tried to start analysis.py with a folder of txt files and the got the error above.

thomjur commented 2 years ago

yes, I think the idea is to pass everything to the function in analysis.py. did you use doc_type="folder"? Or could you maybe just upload the folder with few .txt examples so I can try it myself? Thanks!

trutzig89182 commented 2 years ago

Just run python3 analysis.py /corpora "test" 3 3 freq folder print with the corpora folder in PyCollocation and got the error. Do I add the arguments incorrectly? I am confused. Because, given that start_collocation_analysis() passes the unit tests and display.get_results_collocates() is called in there, it seems to be executed without error there.

thomjur commented 2 years ago

I think the problem is the / in /corpora. With corpora only it should work. I am too tired right now, but I think the / might point to the root folder here? As it would in Linux. Since ./corpora seems to be working fine as well. Of course, I have only tested it with the files in our test corpus here, there might be more problems when working with your new files.

trutzig89182 commented 2 years ago

Wie dumm, daran lags tatsächlich.

trutzig89182 commented 2 years ago

I stumbled over the same error again, when I took a search term that does not exist within the corpus. I suppose, if there are no files to be analysed in a folder or if the search-term does not exist, the counters stay unspecified. But display.py does not only assume that a counter exists, but that it contains keys and values.

If that’s the case, we should work on more telling errors being thrown if

thomjur commented 2 years ago
thomjur commented 2 years ago

In der finalen Version müssten wir die dann noch behandeln für einen nutzerfreundlichen Output. Aber ich lasse die jetzt zum Testen erst einmal da.