welfare-state-analytics / welfare_state_analytics

Welfare State Analytics
5 stars 0 forks source link

Riksdagsprotokoll 4 – Jupyter lemma sufix/prefix #79

Open fredrik1984 opened 3 years ago

fredrik1984 commented 3 years ago

Skapa en Jupyter-sida (kanske inbyggd i Jupyter-sidan som anges i issue Riksdagsprotokoll 3) där man kan följa utvecklingen av ett visst ords sufix och prefix baserat på PoS-taggningen (se issue Riksdagsprotokoll 2):

roger-mahler commented 3 years ago

Förtydliga, gärna med exempel, vad som avses med suffix och prefix i relation till den information som finns i XML. Är det samma attribut som förekommer i filen?

<corpus>
<text>
<paragraph>
<sentence id="8f74-8955">
<w pos="VB" msd="VB.INF.AKT" lemma="|skapa|" lex="|skapa..vb.1|" sense="|skapa..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="01" dephead="" deprel="ROOT">Skapa</w>
<w pos="DT" msd="DT.UTR.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="02" dephead="03" deprel="DT">en</w>
<w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="03" dephead="01" deprel="SS">Jupyter-sida</w>
<w pos="PAD" msd="PAD" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="04" dephead="03" deprel="IR">(</w>
<w pos="AB" msd="AB" lemma="|kanske|" lex="|kanske..ab.1|" sense="|kanske..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="05" dephead="06" deprel="AA">kanske</w>
<w pos="PC" msd="PC.PRF.UTR.SIN.IND.NOM" lemma="|inbygga|inbyggd|" lex="|inbygga..vb.1|inbyggd..av.1|" sense="|inbyggd..1:0.558|inbygga..1:0.442|" prefix="|in..ab.1|" suffix="|bygga..vb.1|" complemgram="|in..ab.1+bygga..vb.1:3.109e-16|" compwf="|in+byggd|" ref="06" dephead="03" deprel="PT">inbyggd</w>
<w pos="PP" msd="PP" lemma="|i|" lex="|i..pp.1|" sense="|i..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="07" dephead="06" deprel="+A">i</w>
<w pos="NN" msd="NN.UTR.SIN.DEF.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="08" dephead="07" deprel="HD">Jupyter-sidan</w>
<w pos="HP" msd="HP.-.-.-" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="09" dephead="10" deprel="SS">som</w>
<w pos="VB" msd="VB.PRS.SFO" lemma="|ange|" lex="|ange..vb.1|" sense="|ange..1:0.888|ange..3:0.072|ange..2:0.040|" prefix="|an..ab.1|" suffix="|ge..vb.1|" complemgram="|an..ab.1+ge..vb.1:1.074e-10|" compwf="|an+ges|" ref="10" dephead="03" deprel="ET">anges</w>
<w pos="PP" msd="PP" lemma="|i|" lex="|i..pp.1|" sense="|i..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="11" dephead="10" deprel="OA">i</w>
<w pos="PC" msd="PC.PRS.UTR+NEU.SIN+PLU.IND+DEF.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="12" dephead="13" deprel="AT">issue</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|riksdagsprotokoll|" lex="|riksdagsprotokoll..nn.1|" sense="|riksdagsprotokoll..1:-1.000|" prefix="|riksdag..nn.1|" suffix="|protokoll..nn.2|protokoll..nn.1|" complemgram="|riksdag..nn.1+protokoll..nn.2:1.055e-13|riksdag..nn.1+protokoll..nn.1:1.055e-13|riks..nn.1+dags..nn.1+protokoll..nn.2:9.540e-18|rike..nn.1+dags..nn.1+protokoll..nn.2:9.540e-18|rike..nn.1+dags..nn.1+protokoll..nn.1:9.540e-18|riks..nn.1+dags..nn.1+protokoll..nn.1:9.540e-18|" compwf="|riksdags+protokoll|riks+dags+protokoll|" ref="13" dephead="32" deprel="OO">Riksdagsprotokoll</w>
<w pos="RG" msd="RG.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="14" dephead="32" deprel="ROOT">3</w>
<w pos="PAD" msd="PAD" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="15" dephead="14" deprel="JR">)</w>
<w pos="HA" msd="HA" lemma="|där|" lex="|där..ab.1|" sense="|där..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="16" dephead="18" deprel="RA">där</w>
<w pos="PN" msd="PN.UTR.SIN.IND.SUB" lemma="|man|" lex="|man..pn.1|" sense="|man..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="17" dephead="18" deprel="SS">man</w>
<w pos="VB" msd="VB.PRS.AKT" lemma="|kunna|" lex="|kunna..vb.1|" sense="|kunna..1:0.869|kunna..4:0.076|kunna..3:0.042|kunna..2:0.014|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="18" dephead="19" deprel="RA">kan</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|följa|" lex="|följa..vb.1|" sense="|följa..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="19" dephead="14" deprel="AN">följa</w>
<w pos="NN" msd="NN.UTR.SIN.DEF.NOM" lemma="|utveckling|" lex="|utveckling..nn.1|" sense="|utveckling..1:1.000|utveckling..2:0.000|" prefix="|" suffix="|" complemgram="|ut..ab.1+veck..nn.1+lina..nn.1+gen..nn.1:4.730e-25|ut..ab.1+vecka..nn.1+lina..nn.1+gen..nn.1:4.730e-25|ut..ab.1+veck..nn.1+lin..nn.1+gen..nn.1:4.730e-25|ut..ab.1+vecka..nn.1+lin..nn.1+gen..nn.1:4.730e-25|ut..ab.1+vecka..vb.1+lin..nn.1+gen..nn.1:2.554e-35|ut..ab.1+vecka..vb.1+lina..nn.1+gen..nn.1:2.554e-35|" compwf="|ut+veck+lin+gen|" ref="20" dephead="19" deprel="SS">utvecklingen</w>
<w pos="PP" msd="PP" lemma="|av|" lex="|av..pp.1|" sense="|av..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="21" dephead="20" deprel="ET">av</w>
<w pos="DT" msd="DT.NEU.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="22" dephead="24" deprel="DT">ett</w>
<w pos="JJ" msd="JJ.POS.NEU.SIN.IND.NOM" lemma="|viss|" lex="|viss..av.1|" sense="|viss..1:0.917|viss..2:0.083|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="23" dephead="24" deprel="DT">visst</w>
<w pos="NN" msd="NN.NEU.SIN.IND.GEN" lemma="|ord|" lex="|ord..nn.1|" sense="|ord..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="24" dephead="26" deprel="DT">ords</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="25" dephead="26" deprel="CJ">sufix</w>
<w pos="KN" msd="KN" lemma="|och|" lex="|och..kn.1|" sense="|och..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="26" dephead="21" deprel="PA">och</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|prefix|" lex="|prefix..nn.1|" sense="|prefix..1:-1.000|" prefix="|pre..sxc.1|" suffix="|fix..nn.2|fix..nn.1|" complemgram="|pre..sxc.1+fix..nn.2:2.951e-27|pre..sxc.1+fix..nn.1:2.951e-27|" compwf="|pre+fix|" ref="27" dephead="26" deprel="CJ">prefix</w>
<w pos="PC" msd="PC.PRF.NEU.SIN.IND.NOM" lemma="|basera|" lex="|basera..vb.1|" sense="|basera..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="28" dephead="19" deprel="SP">baserat</w>
<w pos="PP" msd="PP" lemma="|på|" lex="|på..pp.1|" sense="|på..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="29" dephead="19" deprel="AA">på</w>
<w pos="NN" msd="NN.UTR.SIN.DEF.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="30" dephead="29" deprel="PA">PoS-taggningen</w>
<w pos="PAD" msd="PAD" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="31" dephead="14" deprel="IR">(</w>
<w pos="VB" msd="VB.IMP.AKT" lemma="|se|" lex="|se..vb.1|" sense="|se..6:0.936|se..1:0.020|se..3:0.018|se..4:0.013|se..5:0.008|se..2:0.004|ses..1:0.002|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="32" dephead="11" deprel="PA">se</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="33" dephead="32" deprel="ET">issue</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|riksdagsprotokoll|" lex="|riksdagsprotokoll..nn.1|" sense="|riksdagsprotokoll..1:-1.000|" prefix="|riksdag..nn.1|" suffix="|protokoll..nn.2|protokoll..nn.1|" complemgram="|riksdag..nn.1+protokoll..nn.2:1.055e-13|riksdag..nn.1+protokoll..nn.1:1.055e-13|riks..nn.1+dags..nn.1+protokoll..nn.2:9.540e-18|rike..nn.1+dags..nn.1+protokoll..nn.2:9.540e-18|rike..nn.1+dags..nn.1+protokoll..nn.1:9.540e-18|riks..nn.1+dags..nn.1+protokoll..nn.1:9.540e-18|" compwf="|riksdags+protokoll|riks+dags+protokoll|" ref="34" dephead="35" deprel="DT">Riksdagsprotokoll</w>
<w pos="RG" msd="RG.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="35" dephead="33" deprel="RA">2</w>
<w pos="PAD" msd="PAD" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="36" dephead="35" deprel="JR">)</w>
<w pos="MAD" msd="MAD" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="37" dephead="14" deprel="IQ">:</w>
</sentence>
</paragraph>
<paragraph>
<sentence id="8003-83cd">
<w pos="VB" msd="VB.INF.AKT" lemma="|välja|" lex="|välja..vb.1|" sense="|välja..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="01" dephead="" deprel="ROOT">Välja</w>
<w pos="DT" msd="DT.NEU.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="02" dephead="05" deprel="DT">ett</w>
<w pos="JJ" msd="JJ.POS.NEU.SIN.IND.NOM" lemma="|valfri|" lex="|valfri..av.1|" sense="|valfri..1:-1.000|" prefix="|val..nn.1|val..nn.2|" suffix="|fri..av.1|" complemgram="|val..nn.2+fri..av.1:8.954e-11|val..nn.1+fri..av.1:8.954e-11|" compwf="|val+fritt|" ref="03" dephead="05" deprel="AT">valfritt</w>
<w pos="PC" msd="PC.PRF.NEU.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="04" dephead="05" deprel="AT">lemmatiserat</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|ord|" lex="|ord..nn.1|" sense="|ord..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="05" dephead="01" deprel="SS">ord</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|skapa|" lex="|skapa..vb.1|" sense="|skapa..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="06" dephead="01" deprel="VG">Skapa</w>
<w pos="NN" msd="NN.UTR.PLU.IND.NOM" lemma="|graf|" lex="|graf..nn.1|" sense="|graf..2:0.687|graf..1:0.313|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="07" dephead="19" deprel="SS">grafer</w>
<w pos="HP" msd="HP.-.-.-" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="08" dephead="09" deprel="SS">som</w>
<w pos="VB" msd="VB.PRS.AKT" lemma="|visa|" lex="|visa..vb.1|" sense="|visa..1:0.943|visa..3:0.057|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="09" dephead="07" deprel="ET">visar</w>
<w pos="HA" msd="HA" lemma="|hur|" lex="|hur..ab.1|" sense="|hur..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="10" dephead="11" deprel="AA">hur</w>
<w pos="JJ" msd="JJ.POS.UTR+NEU.PLU.IND+DEF.NOM" lemma="|unik|" lex="|unik..av.1|" sense="|unik..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="11" dephead="12" deprel="AT">unika</w>
<w pos="NN" msd="NN.NEU.PLU.IND.NOM" lemma="|antal|" lex="|antal..nn.1|" sense="|antal..1:-1.000|" prefix="|an..ab.1|" suffix="|tal..nn.1|tal..nn.2|" complemgram="|an..ab.1+tal..nn.2:6.302e-11|an..ab.1+tal..nn.1:6.302e-11|" compwf="|an+tal|" ref="12" dephead="09" deprel="OO">antal</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="13" dephead="14" deprel="CJ">sufix</w>
<w pos="AB" msd="AB" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="14" dephead="16" deprel="AA">respektive</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|prefix|" lex="|prefix..nn.1|" sense="|prefix..1:-1.000|" prefix="|pre..sxc.1|" suffix="|fix..nn.2|fix..nn.1|" complemgram="|pre..sxc.1+fix..nn.2:2.951e-27|pre..sxc.1+fix..nn.1:2.951e-27|" compwf="|pre+fix|" ref="15" dephead="14" deprel="CJ">prefix</w>
<w pos="PP" msd="PP" lemma="|till|" lex="|till..pp.1|" sense="|till..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="16" dephead="12" deprel="ET">till</w>
<w pos="DT" msd="DT.NEU.SIN.DEF" lemma="|denna|" lex="|denna..pn.1|" sense="|denna..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="17" dephead="18" deprel="DT">detta</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|ord|" lex="|ord..nn.1|" sense="|ord..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="18" dephead="16" deprel="PA">ord</w>
<w pos="VB" msd="VB.PRS.AKT" lemma="|ha|" lex="|ha..vb.1|" sense="|ha..1:0.997|ha..3:0.003|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="19" dephead="06" deprel="OO">har</w>
<w pos="VB" msd="VB.SUP.SFO" lemma="|förändra|" lex="|förändra..vb.1|" sense="|förändra..1:-1.000|" prefix="|föra..vb.1|för..nn.1|" suffix="|ändra..vb.1|" complemgram="|föra..vb.1+ändra..vb.1:3.678e-14|för..nn.1+ändra..vb.1:4.571e-17|" compwf="|för+ändrats|" ref="20" dephead="19" deprel="VG">förändrats</w>
<w pos="MID" msd="MID" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="21" dephead="01" deprel="IQ">:</w>
<w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|möjlighet|" lex="|möjlighet..nn.1|" sense="|möjlighet..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="22" dephead="01" deprel="MS">möjlighet</w>
<w pos="IE" msd="IE" lemma="|att|" lex="|att..sn.1|" sense="|att..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="23" dephead="22" deprel="ET">att</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|välja|" lex="|välja..vb.1|" sense="|välja..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="24" dephead="23" deprel="IF">välja</w>
<w pos="IE" msd="IE" lemma="|att|" lex="|att..sn.1|" sense="|att..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="25" dephead="24" deprel="OO">att</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|följa|" lex="|följa..vb.1|" sense="|följa..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="26" dephead="25" deprel="IF">följa</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|prefix|" lex="|prefix..nn.1|" sense="|prefix..1:-1.000|" prefix="|pre..sxc.1|" suffix="|fix..nn.2|fix..nn.1|" complemgram="|pre..sxc.1+fix..nn.2:2.951e-27|pre..sxc.1+fix..nn.1:2.951e-27|" compwf="|pre+fix|" ref="27" dephead="26" deprel="OO">prefix</w>
<w pos="MID" msd="MID" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="28" dephead="27" deprel="IK">,</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="29" dephead="30" deprel="CJ">sufix</w>
<w pos="KN" msd="KN" lemma="|eller|" lex="|eller..kn.1|" sense="|eller..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="30" dephead="31" deprel="CJ">eller</w>
<w pos="KN" msd="KN" lemma="|både och|" lex="|både_och..knm.1|" sense="|både_och..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="31" dephead="32" deprel="CJ">både</w>
<w pos="KN" msd="KN" lemma="|och|både och:31|" lex="|och..kn.1|både_och..knm.1:31|" sense="|och..1:-1.000|både_och..1:31:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="32" dephead="27" deprel="PT">och</w>
<w pos="MID" msd="MID" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="33" dephead="32" deprel="CJ">+</w>
<w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|möjlighet|" lex="|möjlighet..nn.1|" sense="|möjlighet..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="34" dephead="33" deprel="PA">möjlighet</w>
<w pos="IE" msd="IE" lemma="|att|" lex="|att..sn.1|" sense="|att..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="35" dephead="67" deprel="CJ">att</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|välja|" lex="|välja..vb.1|" sense="|välja..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="36" dephead="35" deprel="IF">välja</w>
<w pos="JJ" msd="JJ.POS.NEU.SIN.IND.NOM" lemma="|total|" lex="|total..av.1|" sense="|total..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="37" dephead="38" deprel="AT">totalt</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|antal|" lex="|antal..nn.1|" sense="|antal..1:-1.000|" prefix="|an..ab.1|" suffix="|tal..nn.1|tal..nn.2|" complemgram="|an..ab.1+tal..nn.2:6.302e-11|an..ab.1+tal..nn.1:6.302e-11|" compwf="|an+tal|" ref="38" dephead="43" deprel="DT">antal</w>
<w pos="JJ" msd="JJ.POS.UTR+NEU.SIN.DEF.NOM" lemma="|unik|" lex="|unik..av.1|" sense="|unik..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="39" dephead="40" deprel="AT">unika</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="40" dephead="43" deprel="CJ">sufix</w>
<w pos="MID" msd="MID" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="41" dephead="40" deprel="ET">/</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|prefix|" lex="|prefix..nn.1|" sense="|prefix..1:-1.000|" prefix="|pre..sxc.1|" suffix="|fix..nn.2|fix..nn.1|" complemgram="|pre..sxc.1+fix..nn.2:2.951e-27|pre..sxc.1+fix..nn.1:2.951e-27|" compwf="|pre+fix|" ref="42" dephead="41" deprel="PA">prefix</w>
<w pos="KN" msd="KN" lemma="|eller|" lex="|eller..kn.1|" sense="|eller..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="43" dephead="36" deprel="OO">eller</w>
<w pos="DT" msd="DT.NEU.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="44" dephead="46" deprel="DT">ett</w>
<w pos="PC" msd="PC.PRF.NEU.SIN.IND.NOM" lemma="|normalisera|" lex="|normalisera..vb.1|" sense="|normalisera..1:0.523|normalisera..2:0.477|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="45" dephead="46" deprel="AT">normaliserat</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|mått|" lex="|mått..nn.1|" sense="|mått..1:0.736|mått..3:0.256|mått..2:0.008|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="46" dephead="43" deprel="CJ">mått</w>
<w pos="HP" msd="HP.-.-.-" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="47" dephead="51" deprel="SS">som</w>
<w pos="PP" msd="PP" lemma="|på|" lex="|på..pp.1|" sense="|på..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="48" dephead="51" deprel="AA">på</w>
<w pos="DT" msd="DT.NEU.SIN.IND" lemma="|någon|" lex="|någon..pn.1|" sense="|någon..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="49" dephead="50" deprel="DT">något</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|sätt|" lex="|sätt..nn.1|" sense="|sätt..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="50" dephead="48" deprel="PA">sätt</w>
<w pos="VB" msd="VB.PRS.AKT" lemma="|ta|ta hänsyn|" lex="|ta..vb.1|ta_hänsyn..vbm.1|" sense="|ta..2:0.834|ta..1:0.082|ta..4:0.080|ta..3:0.003|ta_hänsyn..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="51" dephead="46" deprel="ET">tar</w>
<w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|hänsyn|ta hänsyn:51|" lex="|hänsyn..nn.1|ta_hänsyn..vbm.1:51|" sense="|hänsyn..1:0.993|hänsyn..2:0.007|ta_hänsyn..1:51:-1.000|" prefix="|hän..ab.1|" suffix="|syn..nn.1|syn..nn.2|" complemgram="|hän..ab.1+syn..nn.2:2.886e-12|hän..ab.1+syn..nn.1:2.886e-12|" compwf="|hän+syn|" ref="52" dephead="51" deprel="OO">hänsyn</w>
<w pos="PP" msd="PP" lemma="|till|" lex="|till..pp.1|" sense="|till..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="53" dephead="51" deprel="OA">till</w>
<w pos="DT" msd="DT.UTR.SIN.DEF" lemma="|en|den|" lex="|en..al.1|den..pn.1|" sense="|den..1:-1.000|en..2:-1.000|den..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="54" dephead="56" deprel="DT">den</w>
<w pos="JJ" msd="JJ.POS.UTR+NEU.SIN.DEF.NOM" lemma="|total|" lex="|total..av.1|" sense="|total..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="55" dephead="56" deprel="AT">totala</w>
<w pos="NN" msd="NN.UTR.SIN.DEF.NOM" lemma="|frekvensförändring|" lex="|" sense="|" prefix="|frekvens..nn.1|" suffix="|förändring..nn.1|" complemgram="|frekvens..nn.1+förändring..nn.1:7.787e-12|frekvens..nn.1+föra..vb.1+ändring..nn.1:1.527e-20|frekvens..nn.1+för..nn.1+ändring..nn.1:2.822e-22|" compwf="|frekvens+förändringen|frekvens+för+ändringen|" ref="56" dephead="53" deprel="PA">frekvensförändringen</w>
<w pos="PP" msd="PP" lemma="|av|" lex="|av..pp.1|" sense="|av..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="57" dephead="56" deprel="ET">av</w>
<w pos="DT" msd="DT.NEU.SIN.DEF" lemma="|denna|" lex="|denna..pn.1|" sense="|denna..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="58" dephead="60" deprel="DT">detta</w>
<w pos="PC" msd="PC.PRF.UTR+NEU.SIN.DEF.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="59" dephead="60" deprel="AT">lemmatiserade</w>
<w pos="NN" msd="NN.NEU.SIN.IND.NOM" lemma="|ord|" lex="|ord..nn.1|" sense="|ord..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="60" dephead="61" deprel="DT">ord</w>
<w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|möjlighet|" lex="|möjlighet..nn.1|" sense="|möjlighet..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="61" dephead="57" deprel="PA">Möjlighet</w>
<w pos="IE" msd="IE" lemma="|att|" lex="|att..sn.1|" sense="|att..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="62" dephead="51" deprel="EO">att</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|välja|" lex="|välja..vb.1|" sense="|välja..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="63" dephead="62" deprel="IF">välja</w>
<w pos="NN" msd="NN.NEU.PLU.IND.NOM" lemma="|stapeldiagram|" lex="|stapeldiagram..nn.1|" sense="|stapeldiagram..1:-1.000|" prefix="|stapel..nn.1|stapla..vb.1|" suffix="|diagram..nn.1|" complemgram="|stapel..nn.1+diagram..nn.1:4.371e-13|stapla..vb.1+diagram..nn.1:4.229e-21|stapel..nn.1+dia..nn.1+gram..nn.1:2.426e-26|stapla..vb.1+dia..nn.1+gram..nn.1:5.290e-34|" compwf="|stapel+diagram|stapel+dia+gram|" ref="64" dephead="65" deprel="CJ">stapeldiagram</w>
<w pos="KN" msd="KN" lemma="|eller|" lex="|eller..kn.1|" sense="|eller..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="65" dephead="63" deprel="OO">eller</w>
<w pos="NN" msd="NN.NEU.PLU.IND.NOM" lemma="|trenddiagram|" lex="|" sense="|" prefix="|trend..nn.1|" suffix="|diagram..nn.1|" complemgram="|trend..nn.1+diagram..nn.1:1.084e-11|trend..nn.1+dia..nn.1+gram..nn.1:6.018e-25|" compwf="|trend+diagram|trend+dia+gram|" ref="66" dephead="65" deprel="CJ">trenddiagram</w>
<w pos="KN" msd="KN" lemma="|samt|" lex="|samt..kn.1|" sense="|samt..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="67" dephead="34" deprel="ET">samt</w>
<w pos="IE" msd="IE" lemma="|att|" lex="|att..sn.1|" sense="|att..1:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="68" dephead="67" deprel="CJ">att</w>
<w pos="VB" msd="VB.INF.AKT" lemma="|få|" lex="|få..vb.1|" sense="|få..1:0.851|få..3:0.149|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="69" dephead="68" deprel="IF">få</w>
<w pos="PL" msd="PL" lemma="|ut|" lex="|ut..ab.1|" sense="|ut..1:0.910|ut..2:0.090|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="70" dephead="69" deprel="PL">ut</w>
<w pos="NN" msd="NN.NEU.PLU.IND.NOM" lemma="|frekvensdata|" lex="|" sense="|" prefix="|frekvens..nn.1|" suffix="|data..nn.2|data..nn.1|" complemgram="|frekvens..nn.1+data..nn.2:2.752e-11|frekvens..nn.1+data..nn.1:2.752e-11|" compwf="|frekvens+data|" ref="71" dephead="69" deprel="OO">frekvensdata</w>
<w pos="HP" msd="HP.-.-.-" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="72" dephead="71" deprel="ET">som</w>
<w pos="DT" msd="DT.UTR.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="73" dephead="74" deprel="DT">en</w>
<w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|" lex="|" sense="|" prefix="|" suffix="|" complemgram="|" compwf="|" ref="74" dephead="72" deprel="ROOT">cvs-fil</w>
</sentence>
</paragraph>
</text>
</corpus>
fredrik1984 commented 3 years ago

Hm. Vid närmare eftertanke så borde man söka på ett nyckelord som antingen utgörs av ett sufix eller prefix, för att sedan göra listor för vilka unika ord som är kopplade till nyckelordet som prefix respeltive sufix.

Följande exempel är hämtade från XML-filen för SOU 1969:48.

Exemplet ”informationsutredning”. Här anges ”information” som prefix och ”utredning” som sufix:

w deprel="DT" ref="05" suffix="|utredning..nn.1|" pos="NN" prefix="|information..nn.1|" lex="|" lemma="|" saldo="|" msd="NN.UTR.SIN.DEF.NOM" dephead="06">Informationsutredningen

Exemplet ”samhällsinformation”. Här anges ”samhälle” som prefix och ”information” som sufix:

w deprel="DT" ref="16" suffix="|information..nn.1|" pos="NN" prefix="|samhälle..nn.1|" lex="|samhällsinformation..nn.1|" lemma="|samhällsinformation|" saldo="|samhällsinformation..1|" msd="NN.UTR.SIN.IND.NOM" dephead="17">samhällsinformation

Så här ser det ut om det varken finns prefix eller sufix, exemplet ”frågan”:

w deprel="PA" ref="47" suffix="|" pos="NN" prefix="|" lex="|fråga..nn.1|" lemma="|fråga|" saldo="|fråga..2|fråga..3|" msd="NN.UTR.SIN.DEF.NOM" dephead="46">frågan