Open julianvenhuizen opened 2 years ago
the official xml output of Alpino are dependency structures, not parse trees. So it is not just the format that is different.
There are other - less documented - output formats. With the option end_hook=syntax you get something that looks close to what you describe.
Alpino -notk end_hook=syntax -parse Dit is een prachtige zin
0| [ @top_cat [ @start [ @max [ @root [ @np [ @pron [ @det Dit ] ] ] [ @optpunct ] [ @sv1 [ @v is ] [ @v2_vp [ @vpx [ @vproj [ @pred [ @np [ @det een ] [ @n [ @a prachtige ] [ @n zin ] ] ] ] [ @vproj [ @vc [ @vb [ @v ] ] ] ] ] ] ] ] ] ] ] [ @optpunct ] ]
Gertjan
On Wed, Feb 23, 2022 at 3:14 PM Julián Venhuizen @.***> wrote:
Is it possible to have Alpino output the parse tree in the following format:
In: "Several theories about the higher prevalence in males have been investigated, but the cause of the difference is unconfirmed; one theory is that females are underdiagnosed."
Out: (S (S (S (NP (NP (JJ Several) (NNS theories)) (PP (IN about) (NP (NP (DT the) (JJR higher) (NN prevalence)) (PP (IN in) (NP (NNS males)))))) (VP (VBP have) (VP (VBN been) (VP (VBN investigated))))) (, ,) (CC but) (S (NP (NP (DT the) (NN cause)) (PP (IN of) (NP (DT the) (NN difference)))) (VP (VBZ is) (ADJP (JJ unconfirmed))))) (: ;) (S (NP (CD one) (NN theory)) (VP (VBZ is) (SBAR (IN that) (S (NP (NNS females)) (VP (VBP are) (ADJP (JJ underdiagnosed))))))) (. .))
This output is currently achieved through the use of AllenNLP and a minimal span-based neural constituency parser https://arxiv.org/abs/1705.03919. However, as I'm also working with Dutch data I intend to use the Alpino parser. If the above output isn't conceivable I suspect I have to go over the XML output and work something out myself.
— Reply to this email directly, view it on GitHub https://github.com/rug-compling/Alpino/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADJF4NL63F4WV24IQA7Q3LU4TTSLANCNFSM5PEPFVRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you. That does indeed look similar. Do you have any documentation on the meaning of the tags in your example? I am unable to find anything online. It would me help a lot if I were to 'translate' these tags to the Penn Treebank bracket labels used in my example output above.
nope, these labels were used internally. Not documenten, I fear. GJ
On Wed, Mar 2, 2022 at 3:40 PM Julián Venhuizen @.***> wrote:
Thank you. That does indeed look similar. Do you have any documentation on the meaning of the tags in your example? I am unable to find anything online. It would me help a lot if I were to 'translate' these tags to the Penn Treebank bracket labels used in my example output above.
— Reply to this email directly, view it on GitHub https://github.com/rug-compling/Alpino/issues/11#issuecomment-1057004064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADJF4OIDCQBRAG3IW2IPLDU554WXANCNFSM5PEPFVRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Is it possible to have Alpino output the parse tree in the following format:
In: "Several theories about the higher prevalence in males have been investigated, but the cause of the difference is unconfirmed; one theory is that females are underdiagnosed."
Out: (S (S (S (NP (NP (JJ Several) (NNS theories)) (PP (IN about) (NP (NP (DT the) (JJR higher) (NN prevalence)) (PP (IN in) (NP (NNS males)))))) (VP (VBP have) (VP (VBN been) (VP (VBN investigated))))) (, ,) (CC but) (S (NP (NP (DT the) (NN cause)) (PP (IN of) (NP (DT the) (NN difference)))) (VP (VBZ is) (ADJP (JJ unconfirmed))))) (: ;) (S (NP (CD one) (NN theory)) (VP (VBZ is) (SBAR (IN that) (S (NP (NNS females)) (VP (VBP are) (ADJP (JJ underdiagnosed))))))) (. .))
This output is currently achieved through the use of AllenNLP and a minimal span-based neural constituency parser. However, as I'm also working with Dutch data I intend to use the Alpino parser. If the above output isn't conceivable I suspect I have to go over the XML output and work something out myself.