nert-nlp / AMR-gs

AMR Parsing via Graph-Sequence Iterative Inference
MIT License
0 stars 0 forks source link

Figure out Preprocessing #4

Open ablodge opened 4 years ago

ablodge commented 4 years ago

Preprocessing in the parser is based on Zhang et al. 2019 and only works on AMRs. We need to figure out whether/how we want to handle preprocessing of UCCA, EDS, DRG, and PTG.

I think to get the preprocessing working on the new data, you only need to modify AMRIO to look more like AMRGraph.

One possible consequence of working without preprocessing: AMRGraph.py apparently expects attributes to be in a particular format or else it ignores them (line 63). While working on the parser without preprocessing, this basically results in all attributes being ignored.

ablodge commented 4 years ago

@jakpra Do you want to work on this one?

jakpra commented 4 years ago

Sure. I'd like to go about this by looking for general (linguistic/structural?) patterns within each framework and across frameworks. Like I said before, the Zhang+19 preprocessing looks very AMR-specific and not very principled. It has many special cases that handle just an individual word or construction. I'm all for handling long-tail phenomena, but I can't imagine that this style of preprocessing is worth spending a lot of time on.

I'll look into the attribute formatting; I guess a simple workaround could just be to comment out those lines that would ignore the "bad" ones... But most importantly, we should check what makes them "bad" and what the shared task has to say about that.

jakpra commented 4 years ago