Purpose of method "foreach" in class CallFormula

stbusch commented 4 years ago

Hi, I'm trying to get some deeper insight in the code of sempre. For the moment, I'm looking at the simple Java logical forms and their execution by the class JavaExecutor- The class CallFormula features the method "forEach". I can't see its exact purpose. Intuitively, I think it manages the recursive decomposition of a more complex CallFormula, and I guess the Boolean returned by func says if the base case of the recursion is reached. But again, I don't get exactly how. Also it seems that this foreach method is not even used in the JavaExecutor's methods execute and processFormula, that apparently are responsible for the execution. Can someone help me to understand the logic behind all this please? Thanks!

ppasupat commented 4 years ago

Hi! The parent abstract class Formula defines a few abstract methods: toLispTree, forEach, map, and mapToList. These are utility functions for traversing Formula objects, which are tree structures. The method forEach in particular can be used to perform something recursively in the tree.

However, it looks like the forEach method is never used by anyone. The more general methods map and mapToList have been used a few times.

stbusch commented 4 years ago

Thanks for the explanations. However, I can't see yet where map and mapToList are used. My IDE (IntelliJ) actually says they aren't. My approach to understand the code now is going through the JavaExecutor's processFormula with an example CallFormula. I basically do get how the formula is recursively processed here but it's unclear to me which information is carried in the field Evaluation of the Response object that is returned by that method. Any hints? thanks!

ppasupat commented 4 years ago

The Executor can use the stats field (Evaluation object) to store arbitrary statistics about the execution (e.g., how much time it takes to execute). See freebase/SparqlExecutor.java (Line 253) for an example. I think JavaExecutor does not use this field. The statistics will be gathered across all executed formulas (Parser.java Line 356) and reported at the end of the run.

I found mapToList used in niche places like freebase/LexiconFn.java (Line 265) and overnight/OvernightFeatureComputer.java (Line 142). I think it's safe to ignore these utility methods for now.

stbusch commented 3 years ago

Thanks again.- I'm at the chapter Batch Learning of the tutorial now, trying to understand the corresponding parts of the sempre code. The classes FeatureExtractor and Rule refer to Rules being "anchored" or "floating". In your paper "Macro Grammars and Holistic Triggering for Efficient Semantic Parsing" you say

„..., in which logical forms are either triggered by specific phrases (anchored) or can be triggered in any context (floating).“

Does this mean an anchored rule is applicable only if a (sub)span of tokens matches the phrase of the rule's definition exactly (and not just a rhs category) - and any rule without that property is floating? Thanks

ppasupat commented 3 years ago

The anchored vs floating distinction is only applicable to the FloatingParser. Other parsers (e.g., BeamParser) assume that all rules are anchored.

A parser constructs Derivation objects representing complete or partial parses. The Rules in the grammar specify how larger Derivations can be constructed from smaller ones.
Each Derivation has a category string (e.g., $TOKEN, $Entity, or $ROOT) and a formula (a ***Formula object)
Each Derivation can be either anchored or floating. An anchored Derivation is associated with a token span (specified by start and end). A floating Derivation is not bound to a span (start and end are set as -1) but instead has a size (explained later below).
At the beginning of the parsing process, initial anchored Derivations are constructed from the input query. These are the "special categories" $TOKEN, $LEMMA_TOKEN, $PHRASE, and $LEMMA_PHRASE specified in the tutorial. For example, the query "state containing san francisco" would produce a Derivation of category = $PHRASE, formula = ValueFormula("san francisco"), start = 2, and end = 4 (among many other Derivations).
An anchored rule uses anchored Derivations and query strings as sources to produce an anchored Derivation. The sources must correspond to consecutive spans in the query. For example, (rule $foo (biggest $bar) (...) (anchored 1)) can only be applied if $bar is an anchored Derivation whose anchoring span appears right after the query token "biggest". If the $bar Derivation is anchored to span [i, j], and the token i-1 is "biggest", the newly constructed $foo would be anchored to span [i-1, j].
Anchored rules construct anchored Derivations with spans of increasing size, so eventually the process will end when all Derivations for span [0, query_length] are constructed. The parsing process ends here for non-FloatingParser, and the parser returns Derivations with category $ROOT.
For FloatingParser, all anchored Derivations are converted into floating Derivations of size 0. Floating rules now construct floating Derivations from floating Derivations. The sources no longer have to be consecutive spans, and the rule cannot specify raw query strings as sources. For example, (rule $foo ($bar $baz) (...)) combines any floating Derivations $bar and $baz into a $foo. Note that the (...) part describe how the formula of the $foo Derivation is built from the children formulas. If this construction fails (e.g., when the formulas do not type check), the parent Derivation is not constructed.
Floating rules can combine any floating Derivations from anywhere forever, so we need an extra stopping criterion: size. The size field of a floating Derivation counts the number of rules used to construct the Derivation. The parsing process stops at a specified Derivation size.

stbusch commented 3 years ago

Thanks for your elaborate explanation! Now I'm trying to track from the main method how sempre is setup for interaction. Apparently the fig/.... supporting packages are used pretty much, but these are not part of sempre and I'm not sure which parts of these are relevant in the particular context of sempre. The main method indirectly calls the static method Execution.runWithObjArray(String [] args, Object [] objects );, with objects containing ("Main", new Main(), Master.getOptionsParser()). This method starts with the call init(args, objects);, which for its part calls the register method of the OptionsParser class. Here, the interface OptionSet is used the get the annotations of the fields of the objects in the Array objects - but it seems in sempre this interface (or a class implementing it) is never used. Is this correct, and if so, can I neglect this register method in the process of understanding how the setup of sempre for interaction is implemented? Thanks again.

ppasupat commented 3 years ago

In practice, the fields in Main that are annotated with @Option will be registered. The Option annotation (from fig.basic) and OptionSet don't interact, so I'm not sure why register successfully parses these Option fields either.
Note that in Master.getOptionsParser, the OptionParser also registers options from the classes specified in the module-classes.txt file (generated during ant compilation). AFAIK this was a hack so that the options of irrelevant packages do not have to be parsed (e.g., the tables.* classes will not be processed if tables ant target was unused).
- In this method, the @Option-annotated fields under the opts object (defined by public static Options opts = new Options(); in most classes) will be registered.

stbusch commented 3 years ago

Thanks again. The background of all my questions that I would like to create a plugin for an existing NLP java application that enhances it with sempre functionality. That requires some deeper understanding of the code. As I see, the real entry point to sempre is not the main method, but the Ruby script run in combination with execrunner. I am completely new to Ruby and I wonder if a complete understanding of these scripts is necessary for my goal. Are they basically "just" setting up the environment to pass parameters to the Java code from the command line (so that understanding the Java code would probably be sufficient to me)? I have established that run basically defines the available modes and then adds them to the global variable $modes, and maps the modes to their functionality in modesMap, based on the information stored in $modes. The final command run!(sel(:mode, modesMap)) most likely leads to the environment being set up according to the chosen mode, though I failed to track that in detail so far. So, the question is: Do I need to understand the Ruby scripts in detail? (learning Ruby is certainly worthwhile, but time pressure forces me to set priorities). Thanks!

ppasupat commented 3 years ago

You might not need to understand the ruby script in detail. The ruby script is just a tool for creating a Java command (or multiple commands) to run. From any ruby command in README or TUTORIAL, you can add -n to print the Java command and exit. That Java command can be invoked directly without using the ruby script.

stbusch commented 3 years ago

Thanks again. That will save me some time. Now I'm back at the Batch Learning chapter of the tutorial. There it says

The rule feature domain tells the feature extractor to increment the feature each time the grammar rule is applied in the derivation.

Could you please tell me if I understand the following points correctly:

"Derivation" does here not refer to a single Rule application but to the whole derivation-tree for each of the two possible formulars
the dimension of the resulting Feature Vector equals the number of rules in the grammar
each of its components corresponds to (another) one of the rules and shows how frequently it was used in the represented derivation tree ? I figured out that the method extractRuleFeatures in class FeatureExtractor must play a main role here but I don't see yet how it counts all rule applications in the derivation tree. Thanks again!

ppasupat commented 3 years ago

Yes, "Derivation" refers to the whole derivation tree. The rule features of a derivation tree are the rules used when building that tree. (the feature names = the rules serialized as strings).
Yes, the dimension is the number of rules, if only rule features are present. The feature vector is implemented as a Map-like object (from feature names to feature values), so the size can be increased when new features are discovered.
Yes, for a rule feature R, the feature value is the frequency of the rule R in the derivation tree.

Regarding rule computation: while one could freshly compute the rule features for each new derivation, the code does it recursively: to compute the rule features for derivation D, combine the rule features of the child derivations of D, then add the rule R applied when constructing D. The localFeatureVector field of D only stores this final rule feature R. During scoring or gradient updates, the features will be gathered recursively.

stbusch commented 3 years ago

Thanks again, ppasupat. While I get the idea, I can't seem to find where this recursive gathering of features and calculating of scores is actually located in the code. The class FeatureExtractor contains the method extractLocal which apparently calls the respective extraction method for each activated feature. The comment here says: This function is called on every sub-Derivation,..., but where does that happen? I found that extractLocal is used by the ParserState class and its subclasses within the method featurizeAndScoreDerivation, but I don't see the recursion that takes the whole derivation tree into account. - The method computeScore in the class Derivation does recursively calculate all scores, but why is it used only the ReinforcementParser? - I would like to gain some more insight here. Thanks!

ppasupat commented 3 years ago

Background

The method parse in Parser is responsible for parsing an utterance. It calls ParserState state = newParserState(params, ex, computeExpectedCounts) and then state.infer() to construct the Derivations. These two methods should be overridden in subclasses of Parser. The state.infer() call should populate the predDerivations field of state with the final Derivations for the example.

(Note: computeExpectedCounts means whether to compute the information necessary for gradient updates. The flag is turned on during training and off during evaluation.)

Let's use BeamParser as an example. Ignore the coarse{State,Prune} stuff (and whenever BeamParserState.mode == bool) which is only for speeding up parsing.

The newParserState method constructs a new BeamParserState object (which is in the same file BeamParser.java). The infer method of BeamParserState constructs Derivations over utterance spans of increasing sizes using the build method, and then call setPredDerivation (defined in the parent class ChartParserState) to populate the predDerivations field.

Tracing the method calls from build leads to the applyRule(start, end, rule, children) method, which constructs Derivations from the given children using the given grammar rule. (Ignore the mode == bool section.) The produced derivations newDeriv are featurized and scored by featurizeAndScoreDerivation (defined in ParserState). This method calls extractLocal to extract features and then computeScoreLocal to compute the score.

Feature extraction and scoring

Feature extraction and scoring are actually not done recursively (sorry for my confusion earlier). Rather, it uses dynamic programming. Let's say a derivation A has one child B, A has local features {a1, a2}, and the total score of all features in B (including the features of its descendants) is already computed and stored in B.score. Then the total score of A, A.score, can be computed as A.score = params.getWeight(a1) + params.getWeight(a2) + B.score.

The extractLocal method (in FeatureExtractor) extracts features only for the current derivation and not the children. For example, the "rule" feature (in extractRuleFeatures) will define only one feature for the rule used to build the derivation. The rules used for building the children should have already been extracted when the children were constructed (and featurizeAndScoreDerivation was called on them).

The computeScoreLocal method (in Derivation) computes score (total score of all features) = total score for local features + sum of the score from children. Like the features, the score field of the children should already have been populated when the children were constructed.

In ReinforcementParser, there is no guarantee that the score of children derivations have been populated before the parent is constructed (I think), so the computeScore is used instead of computeScoreLocal to recursively score the children.

Note that this dynamic programming method only works for linear models (which SEMPRE uses). A model that consider interactions between features would not work.

stbusch commented 3 years ago

Thank you! With your explanations and further studying of the code I now believe to have understood the following:

the field chart of the class (Beam)ChartParserState keeps track of the so far detected derivations. It does so by maintaining a two dimensional map. The two indices stand for start and end of a (sub)span. The Key is the name of a (target) category. The Value is a list of those derivations that have been found for that particular span with that particular category.
The infermethod of BeamParserState first finds the "primitive" derivations, that are based only on Tokens and Phrases. To achieve this, gatherTokenAndPhraseDerivations is called for each subspan of the given span. The results are added to chart.
infer then calls build on subspans of increasing length, respectively for all possible start indices within the span, from left to right. For each of these subspans, and in their thus determined order, build calls applyCatUnaryRules and applyNonCatUnaryRules. These try to apply all available rules to the respective subspan. Everytime a rule is applicable to existing dervations in chart (which is, in case of applyCatUnaryRules, because its rhs matches the category these derivations are mapped to in chart for the respective subspan), applyRule is called. This method executes the rule that was found applicable, creates a new derivation and puts that one in chart.
(Ok, for applyNonCatUnaryRules it is a bit more complex, but I think I got the principle how the most important methods interact...). So for each subspan build is called on, the derivations created on previous subspans can be used to potentially apply rules on.
applyRule also extracts the features of the newly added derivation. It then computes the "total" score of that derivation as the sum of its weighted local features and the scores of its children.
finally, infer callssetPredDerivations. This puts alls derivations that go over the whole original span and have category $Root (and so qualify to be returned to the user) into the field predDerivations of BeamParserState.

Please correct me if I'm wrong.

ppasupat commented 3 years ago

Yes, these are all correct. Different parsers might construct logical forms in different orders (e.g., the second stage of FloatingParser does not care about start and end; the ReinforcementParser learns to pick which partial Derivation to expand on; etc.), but the general ideas about applying rules, scoring, and then collecting predDerivations should be the same.

stbusch commented 3 years ago

Thanks. Next thing will be to investigate the implementation of the math behind it, the actual calculation of parameters and probabilities.

stbusch commented 3 years ago

So I'm trying to get to the core of the math behind the lerning process. I've brushed up my knowledge on Stochastic Gradient Descent, but have problems mapping it to the code. The main method here is apparently learn in the Learner class. It calls processExamples, which uses parseExample, checkGradient and updateWeights (which calls update in class Params). The names suggest that the actual updating happens in these last two mentioned. But in the checkGradient method there is the line

perturbedParams.getWeights().put(feature, perturbedParams.getWeight(feature) + eps);

which looks like the weights are updated here (but with the constant eps and not with a calculated SGD step). Also, what part of the SGD algorithm do the fields objectiveValue and expectedCounts of the ParserState class correspond to, that are used by checkGradient ? Could you please provide some hints about the ML implementation? Thanks

ppasupat commented 3 years ago

Objective function and gradient

A derivation has score score(deriv) = sum_{feature f} params.weights[f] * deriv.feature_count[f]
- This is computed by Derivation.computeScore(params)
Define p(deriv) = exp[score(deriv)] / sum_deriv' exp[score(deriv')]
The objective function is the log-likelihood of the derivations with correct denotations
- The likelihood is sum p(deriv d with correct deno) = sum{d with correct deno} exp[score(d)] / sum{all derivs d} exp[score(d)]
- The log-likelihood is log(sum{d with correct deno} exp[score(d)]) - log(sum{all derivs d} exp[score(d)])
The "expected count" of f is the derivative of the objective function with respect to params.weights[f]
- This involves a bit of algebra.
To maximize the objective function, we add (expected count of f * learning rate) to the weight of f

How all this is done in the code

Computing expected counts is done in ParserState.computeExpectedCounts
Now in Learner.processExamples:
- computeExpectedCounts (boolean) indicates whether the weights should be updated
- The update is done in batches: The expectedCounts in the ParserState are gathered in the counts map until a certain number of examples (specified by batchSize) is processed. Then updateWeights(counts) is called on the accumulated expected counts (after which counts is reset).
- updateWeights(counts) essentially calls params.update(counts). Ignoring L1-regularization, the function just does MapUtils.incr(weights, f, stepSize * g);. This means adding (expected count of f * learning rate) to the weight of f.
You can ignore the checkGradient part. It is used to check if the gradient update is correct. It does not actually update the weights.

stbusch commented 3 years ago

A question about how to tell a FeatureExtractor which features it is supposed to use: I understand these features are specified in the command line using -FeatureExtractor.featureDomains...... . But where in the code are these actually added to the HashSet featureDomains in the class Options of the FeatureExtractor?

stbusch commented 3 years ago

Is it save to say, that for parsing relatively simple utterances into logical forms that are meant to be executed by JavaExecutor mostly the BeamParser Class and the Rule Feature are relevant?

ppasupat commented 3 years ago

Using the BeamParser should be sufficient for the parser. For the features, the rule features alone might be insufficient (since it only counts how many times each rule is used, regardless of the input utterance). The geo880 mode (for Geoquery dataset) lists the following features:

rule opCount constant whType span lemmaAndBinaries denotation lexAlign joinPos skipPos

(Line 1093 in the run script). This might be a good set of features to try (though some of which are not guaranteed to work correctly based on the grammar and executor used).

stbusch commented 3 years ago

Thanks for the hint. I was asking because I'd like to offer the user a reasonable selection of Features to choose from via checkboxes in my GUI. I'd also like to add a short description to of each Feature to the GUI. The "constant" Feature seems to be located in the class ConstantFn directly. Am I right that the code segment

          res.addFeature("constant", ex.phraseString(c.getStart(), c.getEnd()) + " --- " + formula.toString());

registers appliance of ConstantFn per subspan? Also, which part of the code does actually trigger that registration? It seems that there is no method in FeatureExtractor taking care of the "constant" Feature.

ppasupat commented 3 years ago

Feature firing in SEMPRE is pretty complicated due to legacy reasons.

This docstring of FeatureExtractor gives an overview of the 3 ways features can be defined.
Feature domains can be turned on or off based on the flag FeatureExtractor.featureDomains.
Features defined in a SemanticFn (like the example you gave for ConstantFn) will be defined once the SemanticFn is called (i.e., when building the Derivation).
- In the case of ConstantFn, the feature is defined on every subspan the corresponding rule fires. For example, (rule $TypeNP (meeting) (ConstantFn en.meeting)) detects the word "meeting" and create a Derivation of category $TypeNP with formula en.meeting. The res.addFeature(...) line you mentioned will register a "constant" feature during the creation of the Derivation (res).
Other features will be defined when ParserState.featurizeAndScoreDerivation(deriv) is called. The FeatureExtractor goes through all features defined locally in the FeatureExtractor plus any registered FeatureComputer classes.

stbusch commented 3 years ago

I'm back at investigating the features. whType is using Part-of-Speech tagging. As far as I managed to trace it, the method preprocess in the Example class populates the field posTags inn the field languageInfo by applying a LanguageAnalyzer. I'd like to know how it is determined which POS tags are actually added here and if it is possible to change (add/remove) POS tags manually? thanks!

ppasupat commented 3 years ago

The types of POS tags depend on the LanguageAnalyzer used (specified by the flag --LanguageAnalyzer.languageAnalyzer). The available choices are SimpleAnalyzer (which is a dummy analyzer that only recognizes numbers) and corenlp.CoreNLPAnalyzer (which runs CoreNLP).

The POS tags returned by CoreNLP are the based on the model used, with the default using Penn Treebank tags. The class CoreNLPAnalyzer postprocesses some of the tags (e.g., marking auxiliary verbs), and more postprocessing could be hacked in there.

stbusch commented 3 years ago

Thanks. I will think over if I need this feature. It's probably more relevant for database queries than for the cases the JavaExecutor handles.

stbusch commented 3 years ago

Shouldn't it be possible to map Strings to categories, e.g.

(rule $City (string "hamburg") (IdentityFn))
(rule $City (string "stuttgart") (IdentityFn))
(rule $City (string "munic") (IdentityFn))
(rule $ROOT ($City) (IdentityFn))

as a part/base of a more complex grammar?

In my application, using these rules, these city names aren't recognized. Is the grammar itself wrong? Otherwise it would mean, my application doesn't handle the grammar file correctly. Thanks

ppasupat commented 3 years ago

Could you try changing the rules to look like:

(rule $City (hamburg) (IdentityFn))

stbusch commented 3 years ago

I had tried this; I then get:

   ERROR: Composition failed: rule = $City -> hamburg (IdentityFn), children = []
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-374" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
        at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165)
        at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:224)
        at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:231)
        at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:123)
        at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98)
        at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170)....

(rest of the error message is referring to classes in my application) What apparently works is: (rule $City (hamburg) (ConstantFn (string hamburg))) but that feels quite cumbersome.

stbusch commented 3 years ago

The current grammar looks like this:

(rule $Dir (from) (ConstantFn (lambda x (call + (string "place of departure") (string :)(var x)))))
(rule $From ($Dir $City) (JoinFn forward))

(rule $Dir (to) (ConstantFn (lambda x (call + (string "arrival location") (string :)(var x)))))
(rule $To ($Dir $City) (JoinFn forward))

(rule $Part (by) (ConstantFn (lambda x (call + (string "transportation") (string :)(var x)))))
(rule $By ($Part $Transport) (JoinFn forward))

(rule $Transport (plane) (ConstantFn (string plane)))
(rule $Transport (train) (ConstantFn (string train)))

(rule $City (hamburg) (ConstantFn (string hamburg)))
(rule $City (stuttgart) (ConstantFn (string stuttgart)))
(rule $City (frankfurt) (ConstantFn (string frankfurt)))
(rule $City (munic) (ConstantFn (string munic)))
(rule $City (hannover) (ConstantFn (string hannover)))

(rule $ROOT ($From) (IdentityFn))
(rule $ROOT ($To) (IdentityFn))
(rule $ROOT ($By) (IdentityFn))

and manages to parse e.g. "from munic" into "(string "place of departure:munic")

though it for some reason returns the same derivation tree twice here - , but I haven't found a way to parse a combination of the categories $From, $To and $By. Adding the rule (rule $ROOT ($From $By) (ConcatFn "")) and entering "from munic by train" returned (string nullnull). Is there a way? thanks!

ppasupat commented 3 years ago

With this grammar, the utterance "from munic" gives 2 derivations:
- $ROOT --> $From; $From --> $Dir $City; $Dir --> from; $City --> munic
- $ROOT --> $To; $To --> $Dir $City; $Dir --> from; $City --> munic
For "from munic by train", could you try:
```
(rule $ROOT ($From $By) (lambda x (lambda y (call + (var x) (var y)))))
```
I think ConcatFn can only concatenate raw strings, not results of (call ...).

stbusch commented 3 years ago

Thanks. With your hint, I have the grammar now set up that way:

(rule $Dir (from) (ConstantFn (lambda x (call + (string "from") (string ": ")(var x)))))
(rule $From ($Dir $City) (JoinFn forward))

(rule $Dir (to) (ConstantFn (lambda x (call + (string "to") (string ": ")(var x)))))
(rule $To ($Dir $City) (JoinFn forward))

(rule $Part (by) (ConstantFn (lambda x (call + (string "by") (string ": ")(var x)))))
(rule $By ($Part $Transport) (JoinFn forward))

(rule $Transport (plane) (ConstantFn (string plane)))
(rule $Transport (train) (ConstantFn (string train)))

(rule $City (hamburg) (ConstantFn (string hamburg)))
(rule $City (stuttgart) (ConstantFn (string stuttgart)))
(rule $City (frankfurt) (ConstantFn (string frankfurt)))
(rule $City (munic) (ConstantFn (string munic)))
(rule $City (hannover) (ConstantFn (string hannover)))

(rule $ROOT ($From) (IdentityFn))
(rule $ROOT ($To) (IdentityFn))
(rule $ROOT ($By) (IdentityFn))

(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))

(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $By ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $From ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $To ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $By ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $To ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $From ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)

The idea is to extract the main info from a passenger's utterance, who wants to order a ticket. There is still room for misunderstanding, e.g. "....preferably not by train.... " would still be parsed as "by train", but I'm planning to take care of this by sempre's learning capabilities. About learning: The handleCommandmethod in the Master class contains this sequence:

if (command.equals("accept") || command.equals("a")) {
        ex.setTargetFormula(response.getDerivation().getFormula());
        ex.setTargetValue(response.getDerivation().getValue());
        ex.setContext(session.getContextExcludingLast());
        addNewExample(ex);
      }

The format created this way apparently can't be read with the Dataset - inPaths option (calling the readFromPathPairs method). Of course I could change the format of the exported examples and re-use them, but it seems that is not the way intended by you sempre authors. So to re-use learning achievements in future sessions, should I go by reimporting the trained parameters? Thanks!

ppasupat commented 3 years ago

Importing the trained parameters is the intended way. The params are saved in the experiment directory after training in the normal batch learning mode (where you supply the dataset files).

From the interactive prompt mode (what you get when using the -interactive flag), after parsing a few examples and accepting a few candidates, you can also use the (params FILENAME_HERE) command to dump the current parameters to a file.

stbusch commented 3 years ago

Thanks. In your tutorial, you use the example

What is three plus four times two?

and point out:

There should be two derivations, yielding (number 14) and (number 11), corresponding to either combining three plus four first or four times two first. Note that this is expected because we have not encoded any order of operations anywhere.

Now please assume that I'd like sempre to learn the order of operations (multiplication having precedence over addition). What would be a recommendable feature representation here? The Rule feature would probably not be sufficient, because that would represent just the number of rule applications but not their order.

stbusch commented 3 years ago

...or maybe the Rule Feature could work with more sophisticated rules that can distinguish between e.g. sums of products and products of sums so that learning would have to encourage sums of products....? No clear plan yet. Inspiration would be appreciated.

ppasupat commented 3 years ago

To learn the order or operations, I think a more complex feature needs to be defined (e.g., depth of the derivation tree, or a binary feature indicating if the "add" rule was used in any child of the "multiple" rule). Such an ambiguity doesn't often occur in factoid questions, so there aren't features dealing with it.

stbusch commented 3 years ago

For some of the intended use cases I'd like to create and export training examples (instead of params) in a format that makes them reusable for training. The training example from your tutorial had this format:

 (example
   (utterance "three and four")
   (targetValue (number 7))
  )

If I have several examples in the same file, are there additional rules for correct fromatting to make them readable? E.g. seperators between the examples? Would this format be ok?

(example
  (utterance "three times six plus nine")
  (targetValue (number 27))
)

(example
  (utterance "three times five plus seven")
  (targetValue (number 22))
)

(example
  (utterance "three times seven plus eight")
  (targetValue (number 29))
)

ppasupat commented 3 years ago

The additional spaces should be fine. Comments can also be added: begin the comment with #.

stbusch commented 3 years ago

Thanks for this confirmation. I'm back at the earlier example of a passenger ordering a ticket I would like to have the parsing result as a list of strings, e.g.

(list (string hamburg) (string munic) (string plane))

or

(list (string "from hamburg") (string "to munic") (string "by plane"))

Is there a way (a function) to write rules that create a list from non-list type of values?

stbusch commented 3 years ago

Probably I should try to write a sempre function that does this, but am a bit out of practice when it comes to coding. Would it have to look approximately like this:

public class ListFn extends SemanticFn {
private ListValue listValue;
public ListFn (ListValue listValue){this.listValue=listValue;};
    @Override
    public DerivationStream call(Example ex, Callable c) {
        return new SingleDerivationStream() {
            @Override
            public Derivation createDerivation() {

                for (int i = 0; i < c.getChildren().size(); i++) {
              listValue.values.add(c.getChildren().get(i).getValue());
                }
                return new Derivation.Builder()
                        .withCallable(c)
                        .formula(new ValueFormula<>(listValue))
                        .createDerivation();
            }
        };
    }
}

? Also, would I have to add this class to the sempre code (locally), or is there a way sempre could use it from within my application? So far, my application could successfully use all sempre classes it needed, but when I try to use a grammar that contains rules with this new function, I get a ClassNotFoundException. Thanks for any help.

ppasupat commented 3 years ago

The code you have looks about right. New SemanticFn should be added to the SEMPRE code locally. SEMPRE uses reflection to resolve the name of the SemanticFn, so the SemanticFn has to be inside the SEMPRE library.

stbusch commented 3 years ago

Thanks. I'm back at trying to get a better understanding of the Features you listed earlier. Probably for simple subjects like the ticket order I won't need them all, but still: What does the skipPos feature represent?

ppasupat commented 3 years ago

Looks like skipPos is defined here. It's only defined on SelectFn, which selects a certain child as the output. It seems like skipPos looks at the children that were not chosen, and combine the part-of-speech of the tokens under them.

Not sure how helpful this feature is in general. This seems pretty task-specific.

stbusch commented 3 years ago

Thanks. I'm trying to apply a grammar using the ListFn class (I had posted before) in my application. While grammars just using functions that you had included in sempre do work fine, the compiler in this case gives:

... Caused by: java.lang.InstantiationException: edu.stanford.nlp.sempre.ListFn ... Caused by: java.lang.NoSuchMethodException: edu.stanford.nlp.sempre.ListFn.()

So I have to add the init method. I tried to understand its purpose by looking at its usage in the other SemanticFn subclasses: I changed my code to this:
public class ListFn extends edu.stanford.nlp.sempre.SemanticFn {
private List list;

public ListFn(List list) {
    this.list = list;
}

;

@Override
public DerivationStream call(Example ex, Callable c) {
    return new SingleDerivationStream() {
        @Override
        public edu.stanford.nlp.sempre.Derivation createDerivation() {
            return new Derivation.Builder()
                    .withCallable(c)
                    .formula(new ValueFormula<>(new ListValue(list)))
                    .createDerivation();
        }
    };
}

public void init(LispTree tree) {
    super.init(tree);
    list = new ArrayList();
    for (int i = 0; i < tree.children.size(); i++) {
        list.add(tree.child(i).value);
    }
}

}


Does this look reasonable? Thanks

ppasupat commented 3 years ago

This looks reasonable, though you should start the for loop (in init) from 1 instead of 0. The input tree looks like (ListFn foo bar), and you don't want to add "ListFn" to list.

stbusch commented 3 years ago

Thanks! I'm back a t the features. I'm afraid I could get an intuitive understanding of what they do/represent only for the rule and the constant feature. Is there an official desription for the others available in the way the rule feature was explained in the tutorial?

ppasupat commented 3 years ago

I don't think it's documented anywhere. When I was using SEMPRE (mostly the tables package) I defined my own features and didn't use any of the default ones. That said, I had some partial notes on what each feature does here:

BridgeFn
- dependencyBridge: adds dependencies for every bridging relation/entity
ConstantFn
- constant: base string (only activated when grounded to string)
JoinFn
- joinPos: POS tags of join arguments
SelectFn
- skipPos: POS of the skipped phrase
SimpleLexiconFn
- basicStats: stats that come with the lexicon
- lexAlign: conjunction of phrase and formula from the lexicon
LexiconFn:
- basicStats:
- lexAlign:
- entityFeatures:
- alignmentScores:
- context:
MultipleDerivationStream
- derivRank: rank of the derivation for each rule
DerivOpCountFeatureExtractor
- opCount: indicator features that count how many times semantic functions and LHSs have been used in the derivation
TextToTextMatcher:
- wordSim:
- tokenMatch:
FeatureExtractor
- rule: the rules used
- dependencyParse / fullDependencyParse: indicator for each alignment between a syntactic dependency and the application of a semantic function
- span: linguistic stuff about the span
- denotation: extract features on the denotation of the logical form produced. e.g., number of items in the list
- whType: conjunction of wh and type
- lemmaAndBinaries: conjunction of non-entity lemma and binary (?)
- paraphrasing: measure how much overlap there is between words in the canonical utterance in the tokens

stbusch commented 3 years ago

Thanks for explaining! I'm at the moment back at the learning algorithm.
When you said here: https://github.com/percyliang/sempre/issues/217#issuecomment-724443027

Define p(deriv) = exp[score(deriv)] / sum_deriv' exp[score(deriv')]

did you mean the "softmax function" https://en.wikipedia.org/wiki/Softmax_function ?

ppasupat commented 3 years ago

Yes, that's a softmax over the scores.

stbusch commented 3 years ago

Thanks.- In the internal handling of the machine learning algorithms, is there an any concept of negative training examples? Or does any result not "accepted" by the user implicitly count as negative?

percyliang / sempre

Purpose of method "foreach" in class CallFormula #217