quangis / geo-question-parser

Extract core concept transformations from geo-analytical questions.
0 stars 0 forks source link

Structure blocks according to grammar #7

Open nsbgn opened 2 years ago

nsbgn commented 2 years ago

We claim to be able to extract a lot of information from a question. However, a block where a text field can be left empty; or where a text field occurs on its own without contextual information to constrain it; or multiple variants of a block that carry only syntactical differences, indicates to me that we haven't pinned down exactly what information is contained in a block, and we're handwaving away the extraction of that information by pointing to the ANTLR parser.

This is a problem because the parser is hard to verify and test systematically, since it is much less constrained than the blocks.

Also, hiding blocks makes it hard for the user to understand the space of possibilities. We can disable blocks, but I don't think we should hide them. Of course, this is more feasible when the set of blocks is smaller.

That's why I think we should systematize the set of blocks a bit more. This would also help with issue #6.

nsbgn commented 2 years ago

My own experiments with this will go to the block-experiments branch, which itself branched from merge-with-interface. The latter will not diverge from Haiqi's original blocks for now. (See this comment for information on the branches.)

nsbgn commented 2 years ago

See commit f594abc72c8391ea0e306292b5833e4804a4ffca.

Knowing which parts of the text can be free-form and which should be pinned down is a good first step towards making things more robust.

I'm actually not certain where variability occurs --- this is a first quick guess. It seems way too broad to be dealt with by a parser otherwise; I guarantee that the slightest variations will make it freak out. Also, there are grammatical errors (which I've left in for now), and I don't know how we can make the user understand how much freedom they have in free-form text fields without requiring them to understand what concepts they will recognize --- which is exactly what the interface is supposed to abstract from.

nsbgn commented 2 years ago

This issue has priority for now. While it would be nice to implement the rest too, it is absolutely necessary to structure the blockly blocks to follow the grammar (while removing extraneous syntactic variations), because otherwise the effort to constrain natural language is always going to be in vain.