Figure out Preprocessing

nert-nlp / AMR-gs

AMR Parsing via Graph-Sequence Iterative Inference

MIT License

0 stars 0 forks source link

Figure out Preprocessing #4

Open ablodge opened 4 years ago

ablodge commented 4 years ago

Preprocessing in the parser is based on Zhang et al. 2019 and only works on AMRs. We need to figure out whether/how we want to handle preprocessing of UCCA, EDS, DRG, and PTG.

I think to get the preprocessing working on the new data, you only need to modify AMRIO to look more like AMRGraph.

One possible consequence of working without preprocessing: AMRGraph.py apparently expects attributes to be in a particular format or else it ignores them (line 63). While working on the parser without preprocessing, this basically results in all attributes being ignored.

ablodge commented 4 years ago

@jakpra Do you want to work on this one?

jakpra commented 4 years ago

Sure. I'd like to go about this by looking for general (linguistic/structural?) patterns within each framework and across frameworks. Like I said before, the Zhang+19 preprocessing looks very AMR-specific and not very principled. It has many special cases that handle just an individual word or construction. I'm all for handling long-tail phenomena, but I can't imagine that this style of preprocessing is worth spending a lot of time on.

I'll look into the attribute formatting; I guess a simple workaround could just be to comment out those lines that would ignore the "bad" ones... But most importantly, we should check what makes them "bad" and what the shared task has to say about that.

jakpra commented 4 years ago

[x] Disabled a bunch of AMR-specific well-formedness checks in AMRGraph.py for now so we don't lose anything from the other frameworks.
[ ] Have to check which of the checks should be re-enabled.
[x] Ran stanza to add features.
[x] Extracted vocabs.
[ ] Check what other (liguistically or otherwise) principled preprocessing steps we can do.
[ ] Implement additional preprocessing.
[ ] Run preprocessing.