Open li-xuyang28 opened 3 years ago
Thank you. Did you use the txtcomplexity
script? Since most of the measures are dependent on the length of the input text, the script divides each text into parts (“windows”) of a given length (the default is 1000 words), computes the complexity measures for each part, and outputs the mean (this makes the values comparable between texts). When a text is split into windows, any remaining words are discarded. For example, using a window size of 500, a text of 4562 words is split into 9 windows and the remaining 62 words are discarded.
The error message suggests that the length of your input text is smaller than the window size. Since only “full” windows are taken into account, the script cannot compute the complexity measures. The easiest way to work around this problem is to set a smaller window size using the --window-size
option. For example, to set the window size to 500 words:
txtcomplexity --input-format conllu --window-size 500 <file>
Get error after: txtcomplexity --input-format conllu test_text.txt
sentence in test_text.txt file was taken from example:
1 Das ART 3 NK (TOP(S(NP 2 fremde ADJA 3 NK 3 Schiff NN 4 SB ) 4 war VAFIN -1 -- 5 nicht PTKNEG 6 NG (AVP 6 allein ADV 4 MO ) 7 . $. 6 -- *))
Why?
The input you are using is actually not an example for the CoNLL-U format but for the custom tsv format (I've tried to make this clearer in the README). This means you should use txtcomplexity --input-format tsv test_text.txt
, instead. Unfortunately, there was a bug in the function that reads the custom tsv input, but I've just released version 0.9.1 with a fix, so please update your installation. Also note that simply using the two example sentences from the README will lead to another error message because the text is shorter than the default window size. For testing purposes, you could adjust the window size (for real applications, you would use a much larger window size):
txtcomplexity -i tsv --window-size 12 test_text.txt
And here are the contents of test_text.txt:
1 Das ART 3 NK (TOP(S(NP*
2 fremde ADJA 3 NK *
3 Schiff NN 4 SB *)
4 war VAFIN -1 -- *
5 nicht PTKNEG 6 NG (AVP*
6 allein ADV 4 MO *)
7 . $. 6 -- *))
1 Sieben CARD 2 NK (TOP(S(NP*
2 weitere ADJA 3 MO *)
3 begleiteten VVFIN -1 -- *
4 es PPER 3 OA *
5 . $. 4 -- *))
Thank's for the previous answer!
But not I see a new error after this command: python3 run_stanza.py --language en --output-dir . to_conll.txt
in to_conll.txt there is this random text:
"Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like)."
What is wrong? Where can I find the resources.json file for converting text to CoNlLL -U format?
This problem seems to occur if you haven't used stanza before and there is no resources.json file, yet. I've updated the script to check for the file and to download it, if necessary.
Hi,
Thank you for developing such a comprehensive resource for complexity computation. I am trying the package but ran into some errors,
I used run_stanza.py to prepare the conllu file. Could you please suggest what might I do wrong?
Best