telepathylabsai / OpenDF

Code to reproduce LREC Paper Simplifying Semantic Annotations of SMCalFlow
MIT License
25 stars 6 forks source link

Support SMCalFlow 2.x #28

Open hankcs opened 2 years ago

hankcs commented 2 years ago

Dear authors,

I'm brainstorming to add support for SMCalFlow 2.x. I understand OpenDF is written for 1.x and you mentioned:

Later, a second version was released (V2), which uses a slightly different format, but in this work we use V1.

But most recent works on this dataset use V2 for benchmarking. So, I'm thinking about an upgrade to V2 for broader audience. Could you offer me some suggestions if you have time?

  1. Would it be easier to implement v2 support in OpenDF than to implement a downgrade script to convert v2 to v1?
  2. From my limited understanding of OpenDF, I'll need to modify pre_simplify and simplify_graph. What's the difference between these two?

A list of references:

meron-tl commented 2 years ago

Hello,

Thanks again for your interest in OpenDF and your willingness to contribute to it.

For V2, I would need to read the paper again (Value-Agnostic Conversational Semantic Parsing), but as far as I recall, the transformation from V1 to V2 annotations was done automatically, without any manual corrections (and maybe some automatic filtering out of less than 1% dialogues, which failed some automatic type checks).

If this is the case, it means that the simplified expressions would look exactly the same, no matter if we started from V1 or V2, so benchmarking is not an issue, (i.e. the simplified OpenDF expressions are still the result of an automatic transformation of V1, either directly, or in two steps - going through V2).

Still, trying to do the simplification starting from V2 could be an interesting thing to do. Our current simplification (of V1) still has a lot of errors, so starting from scratch may bring some new ideas and insights. (it would be a time consuming task - probably weeks of work).

The simplification process itself is separate from the rest of the system - so you could use whatever method you want (either use the simplification mechanism already implemented in OpenDF, or write a new one), as long as the output is the "correct" simplified expression.

If you're interested to go this way, we can discuss this (please contact using the email address in the LREC paper).

Cheers

hankcs commented 2 years ago

Thank you so much for your comments.

I didn't see the section describing how they transformed V1 in their paper Value-Agnostic Conversational Semantic Parsing. They processed both V1 by adding types declarations and maybe they call the results V2. But I do believe V1 and V2 only differ in their syntaxes (plus some filtering) according to their GitHub repo:

SMCalFlow 2.0: This is an updated version of the dataset released with the Task-Oriented Dialogue as Dataflow Synthesis (TACL 2020) paper, which removed a very small number of incorrectly annotated examples, dropped argument names for positional arguments (so that the programs are shorter), and added inferred type arguments for type-parameterized functions that were missing in the original SMCalFlow data.

For benchmarking, I mean V2 is shorter and V2 is strongly typed, which might be easier to parse by a model. Since V2 is easier for parsers, most people would expect other tool chains to use V2 too. I agree it's very time consuming. Hope the utilities in OpenDF will make it faster. Maybe I'll start with certain frequently used functions.

If you're interested to go this way, we can discuss this (please contact using the email address in the LREC paper).

Yes, I'll do in the weekdays using my work email. Thank you again!