ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
359 stars 75 forks source link

need help in fixing text and brackets to its (correct parents) #188

Closed Shasetty closed 5 months ago

Shasetty commented 5 months ago

used text : - On November 29, 2022, Twin Ridge, Carbon Revolution Public Limited Company (formerly known as Poppetell Limited), a public limited company incorporated in Ireland with registered number 607450 (“MergeCo”), Carbon Revolution and Poppettell Merger Sub, a Cayman Islands exempted company and wholly-owned subsidiary of MergeCo (“Merger Sub”), entered into a Business Combination Agreement (as it may be amended or supplemented from time to time, the “Business Combination Agreement”), pursuant to which, among other things, Twin Ridge will be merged with and into Merger Sub, with Merger Sub surviving as a wholly-owned subsidiary of MergeCo (the “Merger”), with shareholders of Twin Ridge receiving ordinary shares of MergeCo, par value $0.0001 (the “MergeCo Ordinary Shares”), in exchange for their existing Twin Ridge Ordinary Shares (as defined below) and existing Twin Ridge warrant holders having their warrants automatically exchanged by assumption by MergeCo of the obligations under such warrants, including to become exercisable in respect of MergeCo Ordinary Shares instead of Twin Ridge Ordinary Shares, subject to, among other things, the approval of Twin Ridge’s shareholders.

used parser:- version : UD2.10 model: english-ewt-ud-2.10-220711

url: https://lindat.mff.cuni.cz/services/udpipe/

At different places, text and brackets is not attacted properly, to give an example, among them:- issue of bracket from part of the above content :- of MergeCo (“Merger Sub”) issue in few words from the above content :- “Business Combination Agreement”)

image

Above given text is a, single paragraph. i checked in other models also, differences mentioned below. model: english-ewt-ud-2.10-220711 , considers as single para english-ewt-ud-2.12-230717 , considers as 2 para.

Even in english-ewt-ud-2.12-230717 text and brackets are not getting properly attached to its parents.

As i am using, english-ewt-ud-2.10-220711, please suggest me how to fix the issue.

foxik commented 5 months ago

As mentioned in https://github.com/ufal/udpipe/issues/175#issuecomment-1768050976, the parsing models are statistical only, and you cannot expect them to be 100% accurate. We try to train the models to be as good as possible, but the models will always produce some mistakes, and it is up to you to decide how to handle such situation.