rug-compling / Alpino

Alpino parser and related tools for Dutch
GNU Lesser General Public License v2.1
22 stars 2 forks source link

Possible to get parse tree output similar to AllenNLP? #11

Open julianvenhuizen opened 2 years ago

julianvenhuizen commented 2 years ago

Is it possible to have Alpino output the parse tree in the following format:

In: "Several theories about the higher prevalence in males have been investigated, but the cause of the difference is unconfirmed; one theory is that females are underdiagnosed."

Out: (S (S (S (NP (NP (JJ Several) (NNS theories)) (PP (IN about) (NP (NP (DT the) (JJR higher) (NN prevalence)) (PP (IN in) (NP (NNS males)))))) (VP (VBP have) (VP (VBN been) (VP (VBN investigated))))) (, ,) (CC but) (S (NP (NP (DT the) (NN cause)) (PP (IN of) (NP (DT the) (NN difference)))) (VP (VBZ is) (ADJP (JJ unconfirmed))))) (: ;) (S (NP (CD one) (NN theory)) (VP (VBZ is) (SBAR (IN that) (S (NP (NNS females)) (VP (VBP are) (ADJP (JJ underdiagnosed))))))) (. .))

This output is currently achieved through the use of AllenNLP and a minimal span-based neural constituency parser. However, as I'm also working with Dutch data I intend to use the Alpino parser. If the above output isn't conceivable I suspect I have to go over the XML output and work something out myself.

gertjanvannoord commented 2 years ago

the official xml output of Alpino are dependency structures, not parse trees. So it is not just the format that is different.

There are other - less documented - output formats. With the option end_hook=syntax you get something that looks close to what you describe.

Alpino -notk end_hook=syntax -parse Dit is een prachtige zin

0| [ @top_cat [ @start [ @max [ @root [ @np [ @pron [ @det Dit ] ] ] [ @optpunct ] [ @sv1 [ @v is ] [ @v2_vp [ @vpx [ @vproj [ @pred [ @np [ @det een ] [ @n [ @a prachtige ] [ @n zin ] ] ] ] [ @vproj [ @vc [ @vb [ @v ] ] ] ] ] ] ] ] ] ] ] [ @optpunct ] ]

Gertjan

On Wed, Feb 23, 2022 at 3:14 PM Julián Venhuizen @.***> wrote:

Is it possible to have Alpino output the parse tree in the following format:

In: "Several theories about the higher prevalence in males have been investigated, but the cause of the difference is unconfirmed; one theory is that females are underdiagnosed."

Out: (S (S (S (NP (NP (JJ Several) (NNS theories)) (PP (IN about) (NP (NP (DT the) (JJR higher) (NN prevalence)) (PP (IN in) (NP (NNS males)))))) (VP (VBP have) (VP (VBN been) (VP (VBN investigated))))) (, ,) (CC but) (S (NP (NP (DT the) (NN cause)) (PP (IN of) (NP (DT the) (NN difference)))) (VP (VBZ is) (ADJP (JJ unconfirmed))))) (: ;) (S (NP (CD one) (NN theory)) (VP (VBZ is) (SBAR (IN that) (S (NP (NNS females)) (VP (VBP are) (ADJP (JJ underdiagnosed))))))) (. .))

This output is currently achieved through the use of AllenNLP and a minimal span-based neural constituency parser https://arxiv.org/abs/1705.03919. However, as I'm also working with Dutch data I intend to use the Alpino parser. If the above output isn't conceivable I suspect I have to go over the XML output and work something out myself.

— Reply to this email directly, view it on GitHub https://github.com/rug-compling/Alpino/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADJF4NL63F4WV24IQA7Q3LU4TTSLANCNFSM5PEPFVRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

julianvenhuizen commented 2 years ago

Thank you. That does indeed look similar. Do you have any documentation on the meaning of the tags in your example? I am unable to find anything online. It would me help a lot if I were to 'translate' these tags to the Penn Treebank bracket labels used in my example output above.

gertjanvannoord commented 2 years ago

nope, these labels were used internally. Not documenten, I fear. GJ

On Wed, Mar 2, 2022 at 3:40 PM Julián Venhuizen @.***> wrote:

Thank you. That does indeed look similar. Do you have any documentation on the meaning of the tags in your example? I am unable to find anything online. It would me help a lot if I were to 'translate' these tags to the Penn Treebank bracket labels used in my example output above.

— Reply to this email directly, view it on GitHub https://github.com/rug-compling/Alpino/issues/11#issuecomment-1057004064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADJF4OIDCQBRAG3IW2IPLDU554WXANCNFSM5PEPFVRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>