Closed Shasetty closed 5 months ago
https://drive.google.com/drive/folders/1CbrXDDpQfx6TJrguGDGrJh0UTyCojW3L?usp=drive_link I have placed 2.10 & 2.12 input & output conullu files for your analysis in the above google drive link (as i cannot paste conllu files)
steps followed to setup:-
(Below steps are copied from Udapi github website) cd git clone https://github.com/udapi/udapi-python.git pip3 install --user -r udapi-python/requirements.txt echo '## Use Udapi from ~/udapi-python/ ##' >> ~/.bashrc echo 'export PATH="$HOME/udapi-python/bin:$PATH"' >> ~/.bashrc echo 'export PYTHONPATH="$HOME/udapi-python/:$PYTHONPATH"' >> ~/.bashrc source ~/.bashrc # or open new bash
Obtained the conllu files from Lindat UDPipe website then executed the below command
udapy -s ud.FixPunct < in.conllu > out.conllu
considered text:-
please do help me in fixing the wrong parent & child relationship
First, this issue does not belong here because there is no UDPipe software bug reported. As explained at https://github.com/ufal/udpipe/issues/175#issuecomment-1768291600, we cannot expect such ridiculous sentence to be parsed without any errors by UDPipe.
If there are any issues with using udapy
and ud.FixPunct
, you can report them in the udapi-python repo. However, as I explain below, there is no Udapi software bug either.
I confirm you used ud.FixPunct
correctly. I've obtained exactly the same results as you after running udapy -s ud.FixPunct < 2_10_input.conllu > 2_10_output.conllu
.
still there are no changes in the parent & child relation ship
No. You can use e.g. diff 2_10_input.conllu 2_10_output.conllu
(or vimdiff
) to see there are 21 changes done by ud.FixPunct
.
I don't see any errors in punctuation attachment in the output CoNLL-U files (both for 2_10_output.conllu and 2_12_output.conllu).
You can use udapy write.Html < 2_10_output.conllu > 2_10_output.html
to get the "js-treex-view.js" visualization.
You can also highlight all nonprojective nodes using udapy -H util.Mark node='node.is_nonprojective()' < 2_10_output.conllu > 2_10_output-nonprojective.html
and check that all the punctuation tokens are attached projectively (unlike in the input files).
(where ever it is wrong)
The ud.FixPunct
corrects punctuation only, as the name suggests. So of course, there are still many other parsing errors left, including strange non-projectivities (e.g. the 126th token "the" and the 128th token "Merger", but these are not punctuation symbols).
Closing, as there is no bug in UDPipe.
It is expected that there will be some errors in the output -- according to https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_212_models, for the model english-gum-ud-2.12-230717
the UAS is 93.72, so even on the in-domain test set, we predict ~6.3% of edges incorrectly; on real data, the number of errors will probably be even larger.
Hi Grammaticians,
I am reopening this ticket, in a hope, to find a work around solution.
As you know my earlier ticket was closed, informing the limitation of the present software.
Now I request, you to provide a work around solution, for the text, where parent and child are wrongly connected.
Points to be considered are:-
1)grammar rules should always be followed. 2)break up the single sentence into multiple sentences. 3)run each sentence individually on 2.10 version 4)merge all the files 5)while merging change parent and child dependency relationship (as needed)
However, the output in "Show Trees" is exactly the same as in "Show Table" (and as the CoNLL-U in "Output Text"), so there is no bug in UDPipe. These GitHub issues are for reporting bugs in the software. You cannot expect 100% parsing accuracy from all models.
BTW: When using e.g. the english-gum-ud-2.12-230717 model, the brackets enclosing FDA and NDA are attached correctly. This suggest GUM is better training data then EWT in this aspect. Indeed, when applying
ud.FixPunct
on en_gum-ud-train.conllu, there are only 39 errors fixed, but on en_ewt-ud-train.conllu, there are 7496 bugs. So maybe the authors of EWT should fix these bugs and the new version of UDPipe will be better. However, that should not be discussed here, but at https://github.com/UniversalDependencies/UD_English-EWT/issuesOriginally posted by @martinpopel in https://github.com/ufal/udpipe/issues/175#issuecomment-1768050976