Closed rafaelanchieta closed 6 years ago
Hi, I suppose you are aware that you may have to make several changes to the preprocessing scripts, as they are written with English in mind. First, I see that for empty named entities you get 0 (zeros) while for the version of the English dependency parser I use I get O (the letter o uppercase). Second, your tagger puts together um and puoco (um=puoco) but your dependency parser does not. This mismatch makes my preprocessing script fail..
Thank you. I will fix theses issues. How do you trained the embeddings and postags?
I get the postags from CoreNLP. The embeddings are pretrained on a wikipedia dump using word2vec. You can find more information on the paper: https://arxiv.org/abs/1608.06111
Thank you again.
I'm trying to adapt your AMRParser to the Portuguese Language. I'm getting an error in preprocessing.py file.
`Sentence #414 (14 tokens): Esse aí disse o principezinho para si mesmo raciocina um pouco como o bêbado [Text=Esse CharacterOffsetBegin=18007 CharacterOffsetEnd=18011 PartOfSpeech=PROP Lemma=Esse NamedEntityTag=0] [Text=aí CharacterOffsetBegin=18012 CharacterOffsetEnd=18014 PartOfSpeech=ADV Lemma=aí NamedEntityTag=0] [Text=disse CharacterOffsetBegin=18015 CharacterOffsetEnd=18020 PartOfSpeech=V Lemma=dizer NamedEntityTag=0] [Text=o CharacterOffsetBegin=18021 CharacterOffsetEnd=18022 PartOfSpeech=DET Lemma=o NamedEntityTag=0] [Text=principezinho CharacterOffsetBegin=18023 CharacterOffsetEnd=18036 PartOfSpeech=N Lemma=principezinho NamedEntityTag=0] [Text=para CharacterOffsetBegin=18037 CharacterOffsetEnd=18041 PartOfSpeech=PRP Lemma=para NamedEntityTag=0] [Text=si CharacterOffsetBegin=18042 CharacterOffsetEnd=18044 PartOfSpeech=PERS Lemma=se NamedEntityTag=0] [Text=mesmo CharacterOffsetBegin=18045 CharacterOffsetEnd=18050 PartOfSpeech=DET Lemma=mesmo NamedEntityTag=0] [Text=raciocina CharacterOffsetBegin=18051 CharacterOffsetEnd=18060 PartOfSpeech=V Lemma=raciocinar NamedEntityTag=0] [Text=um=pouco CharacterOffsetBegin=18061 CharacterOffsetEnd=18069 PartOfSpeech=ADV Lemma=um=pouco NamedEntityTag=0] [Text=como CharacterOffsetBegin=18070 CharacterOffsetEnd=18074 PartOfSpeech=PRP Lemma=como NamedEntityTag=0] [Text=o CharacterOffsetBegin=18075 CharacterOffsetEnd=18076 PartOfSpeech=DET Lemma=o NamedEntityTag=0] [Text=bêbado CharacterOffsetBegin=18077 CharacterOffsetEnd=18083 PartOfSpeech=ADJ Lemma=bêbado NamedEntityTag=0] (ROOT (S (NP (DEM Esse)) (VP (ADV aí) (VP (V disse) (NP (ART o) (N' (N principezinho) (PP (P para) (NP (NP (NP (PRS si)) (S (VP (ADV mesmo) (VP (V raciocina) (ADVP (ADV um) (ADV pouco)))))) (NP (CONJ como) (NP (ART o) (N bêbado)))))))))))
nsubj(disse-3, Esse-1) advmod(disse-3, aí-2) root(ROOT-0, disse-3) det(principezinho-5, o-4) dobj(disse-3, principezinho-5) adpmod(principezinho-5, para-6) adpobj(para-6, si-7) advmod(raciocina-9, mesmo-8) xcomp(disse-3, raciocina-9) det(pouco-11, um-10) nsubj(raciocina-9, pouco-11) adpmod(pouco-11, como-12) det(bêbado-14, o-13) adpcomp(como-12, bêbado-14)`
Could you help me?