percyliang / sempre

Semantic Parser with Execution
Other
828 stars 301 forks source link

Purpose of method "foreach" in class CallFormula #217

Open stbusch opened 3 years ago

stbusch commented 3 years ago

Hi, I'm trying to get some deeper insight in the code of sempre. For the moment, I'm looking at the simple Java logical forms and their execution by the class JavaExecutor- The class CallFormula features the method "forEach". I can't see its exact purpose. Intuitively, I think it manages the recursive decomposition of a more complex CallFormula, and I guess the Boolean returned by func says if the base case of the recursion is reached. But again, I don't get exactly how. Also it seems that this foreach method is not even used in the JavaExecutor's methods execute and processFormula, that apparently are responsible for the execution. Can someone help me to understand the logic behind all this please? Thanks!

ppasupat commented 3 years ago

Hi! The parent abstract class Formula defines a few abstract methods: toLispTree, forEach, map, and mapToList. These are utility functions for traversing Formula objects, which are tree structures. The method forEach in particular can be used to perform something recursively in the tree.

However, it looks like the forEach method is never used by anyone. The more general methods map and mapToList have been used a few times.

stbusch commented 3 years ago

Thanks for the explanations. However, I can't see yet where map and mapToList are used. My IDE (IntelliJ) actually says they aren't. My approach to understand the code now is going through the JavaExecutor's processFormula with an example CallFormula. I basically do get how the formula is recursively processed here but it's unclear to me which information is carried in the field Evaluation of the Response object that is returned by that method. Any hints? thanks!

ppasupat commented 3 years ago

The Executor can use the stats field (Evaluation object) to store arbitrary statistics about the execution (e.g., how much time it takes to execute). See freebase/SparqlExecutor.java (Line 253) for an example. I think JavaExecutor does not use this field. The statistics will be gathered across all executed formulas (Parser.java Line 356) and reported at the end of the run.

I found mapToList used in niche places like freebase/LexiconFn.java (Line 265) and overnight/OvernightFeatureComputer.java (Line 142). I think it's safe to ignore these utility methods for now.

stbusch commented 3 years ago

Thanks again.- I'm at the chapter Batch Learning of the tutorial now, trying to understand the corresponding parts of the sempre code. The classes FeatureExtractor and Rule refer to Rules being "anchored" or "floating". In your paper "Macro Grammars and Holistic Triggering for Efficient Semantic Parsing" you say

„..., in which logical forms are either triggered by specific phrases (anchored) or can be triggered in any context (floating).“

Does this mean an anchored rule is applicable only if a (sub)span of tokens matches the phrase of the rule's definition exactly (and not just a rhs category) - and any rule without that property is floating? Thanks

ppasupat commented 3 years ago

The anchored vs floating distinction is only applicable to the FloatingParser. Other parsers (e.g., BeamParser) assume that all rules are anchored.

stbusch commented 3 years ago

Thanks for your elaborate explanation! Now I'm trying to track from the main method how sempre is setup for interaction. Apparently the fig/.... supporting packages are used pretty much, but these are not part of sempre and I'm not sure which parts of these are relevant in the particular context of sempre. The main method indirectly calls the static method Execution.runWithObjArray(String [] args, Object [] objects );, with objects containing ("Main", new Main(), Master.getOptionsParser()). This method starts with the call init(args, objects);, which for its part calls the register method of the OptionsParser class. Here, the interface OptionSet is used the get the annotations of the fields of the objects in the Array objects - but it seems in sempre this interface (or a class implementing it) is never used. Is this correct, and if so, can I neglect this register method in the process of understanding how the setup of sempre for interaction is implemented? Thanks again.

ppasupat commented 3 years ago
stbusch commented 3 years ago

Thanks again. The background of all my questions that I would like to create a plugin for an existing NLP java application that enhances it with sempre functionality. That requires some deeper understanding of the code. As I see, the real entry point to sempre is not the main method, but the Ruby script run in combination with execrunner. I am completely new to Ruby and I wonder if a complete understanding of these scripts is necessary for my goal. Are they basically "just" setting up the environment to pass parameters to the Java code from the command line (so that understanding the Java code would probably be sufficient to me)? I have established that run basically defines the available modes and then adds them to the global variable $modes, and maps the modes to their functionality in modesMap, based on the information stored in $modes. The final command run!(sel(:mode, modesMap)) most likely leads to the environment being set up according to the chosen mode, though I failed to track that in detail so far. So, the question is: Do I need to understand the Ruby scripts in detail? (learning Ruby is certainly worthwhile, but time pressure forces me to set priorities). Thanks!

ppasupat commented 3 years ago

You might not need to understand the ruby script in detail. The ruby script is just a tool for creating a Java command (or multiple commands) to run. From any ruby command in README or TUTORIAL, you can add -n to print the Java command and exit. That Java command can be invoked directly without using the ruby script.

stbusch commented 3 years ago

Thanks again. That will save me some time. Now I'm back at the Batch Learning chapter of the tutorial. There it says

The rule feature domain tells the feature extractor to increment the feature each time the grammar rule is applied in the derivation.

Could you please tell me if I understand the following points correctly:

ppasupat commented 3 years ago

Regarding rule computation: while one could freshly compute the rule features for each new derivation, the code does it recursively: to compute the rule features for derivation D, combine the rule features of the child derivations of D, then add the rule R applied when constructing D. The localFeatureVector field of D only stores this final rule feature R. During scoring or gradient updates, the features will be gathered recursively.

stbusch commented 3 years ago

Thanks again, ppasupat. While I get the idea, I can't seem to find where this recursive gathering of features and calculating of scores is actually located in the code. The class FeatureExtractor contains the method extractLocal which apparently calls the respective extraction method for each activated feature. The comment here says: This function is called on every sub-Derivation,..., but where does that happen? I found that extractLocal is used by the ParserState class and its subclasses within the method featurizeAndScoreDerivation, but I don't see the recursion that takes the whole derivation tree into account. - The method computeScore in the class Derivation does recursively calculate all scores, but why is it used only the ReinforcementParser? - I would like to gain some more insight here. Thanks!

ppasupat commented 3 years ago

Background

The method parse in Parser is responsible for parsing an utterance. It calls ParserState state = newParserState(params, ex, computeExpectedCounts) and then state.infer() to construct the Derivations. These two methods should be overridden in subclasses of Parser. The state.infer() call should populate the predDerivations field of state with the final Derivations for the example.

(Note: computeExpectedCounts means whether to compute the information necessary for gradient updates. The flag is turned on during training and off during evaluation.)

Let's use BeamParser as an example. Ignore the coarse{State,Prune} stuff (and whenever BeamParserState.mode == bool) which is only for speeding up parsing.

The newParserState method constructs a new BeamParserState object (which is in the same file BeamParser.java). The infer method of BeamParserState constructs Derivations over utterance spans of increasing sizes using the build method, and then call setPredDerivation (defined in the parent class ChartParserState) to populate the predDerivations field.

Tracing the method calls from build leads to the applyRule(start, end, rule, children) method, which constructs Derivations from the given children using the given grammar rule. (Ignore the mode == bool section.) The produced derivations newDeriv are featurized and scored by featurizeAndScoreDerivation (defined in ParserState). This method calls extractLocal to extract features and then computeScoreLocal to compute the score.

Feature extraction and scoring

Feature extraction and scoring are actually not done recursively (sorry for my confusion earlier). Rather, it uses dynamic programming. Let's say a derivation A has one child B, A has local features {a1, a2}, and the total score of all features in B (including the features of its descendants) is already computed and stored in B.score. Then the total score of A, A.score, can be computed as A.score = params.getWeight(a1) + params.getWeight(a2) + B.score.

The extractLocal method (in FeatureExtractor) extracts features only for the current derivation and not the children. For example, the "rule" feature (in extractRuleFeatures) will define only one feature for the rule used to build the derivation. The rules used for building the children should have already been extracted when the children were constructed (and featurizeAndScoreDerivation was called on them).

The computeScoreLocal method (in Derivation) computes score (total score of all features) = total score for local features + sum of the score from children. Like the features, the score field of the children should already have been populated when the children were constructed.

In ReinforcementParser, there is no guarantee that the score of children derivations have been populated before the parent is constructed (I think), so the computeScore is used instead of computeScoreLocal to recursively score the children.

Note that this dynamic programming method only works for linear models (which SEMPRE uses). A model that consider interactions between features would not work.

stbusch commented 3 years ago

Thank you! With your explanations and further studying of the code I now believe to have understood the following:

Please correct me if I'm wrong.

ppasupat commented 3 years ago

Yes, these are all correct. Different parsers might construct logical forms in different orders (e.g., the second stage of FloatingParser does not care about start and end; the ReinforcementParser learns to pick which partial Derivation to expand on; etc.), but the general ideas about applying rules, scoring, and then collecting predDerivations should be the same.

stbusch commented 3 years ago

Thanks. Next thing will be to investigate the implementation of the math behind it, the actual calculation of parameters and probabilities.

stbusch commented 3 years ago

So I'm trying to get to the core of the math behind the lerning process. I've brushed up my knowledge on Stochastic Gradient Descent, but have problems mapping it to the code. The main method here is apparently learn in the Learner class. It calls processExamples, which uses parseExample, checkGradient and updateWeights (which calls update in class Params). The names suggest that the actual updating happens in these last two mentioned. But in the checkGradient method there is the line

perturbedParams.getWeights().put(feature, perturbedParams.getWeight(feature) + eps);

which looks like the weights are updated here (but with the constant eps and not with a calculated SGD step). Also, what part of the SGD algorithm do the fields objectiveValue and expectedCounts of the ParserState class correspond to, that are used by checkGradient ? Could you please provide some hints about the ML implementation? Thanks

ppasupat commented 3 years ago

Objective function and gradient

How all this is done in the code

stbusch commented 3 years ago

A question about how to tell a FeatureExtractor which features it is supposed to use: I understand these features are specified in the command line using -FeatureExtractor.featureDomains...... . But where in the code are these actually added to the HashSet featureDomains in the class Options of the FeatureExtractor?

stbusch commented 3 years ago

Is it save to say, that for parsing relatively simple utterances into logical forms that are meant to be executed by JavaExecutor mostly the BeamParser Class and the Rule Feature are relevant?

ppasupat commented 3 years ago

Using the BeamParser should be sufficient for the parser. For the features, the rule features alone might be insufficient (since it only counts how many times each rule is used, regardless of the input utterance). The geo880 mode (for Geoquery dataset) lists the following features:

rule opCount constant whType span lemmaAndBinaries denotation lexAlign joinPos skipPos

(Line 1093 in the run script). This might be a good set of features to try (though some of which are not guaranteed to work correctly based on the grammar and executor used).

stbusch commented 3 years ago

Thanks for the hint. I was asking because I'd like to offer the user a reasonable selection of Features to choose from via checkboxes in my GUI. I'd also like to add a short description to of each Feature to the GUI. The "constant" Feature seems to be located in the class ConstantFn directly. Am I right that the code segment

          res.addFeature("constant", ex.phraseString(c.getStart(), c.getEnd()) + " --- " + formula.toString());

registers appliance of ConstantFn per subspan? Also, which part of the code does actually trigger that registration? It seems that there is no method in FeatureExtractor taking care of the "constant" Feature.

ppasupat commented 3 years ago

Feature firing in SEMPRE is pretty complicated due to legacy reasons.

stbusch commented 3 years ago

I'm back at investigating the features. whType is using Part-of-Speech tagging. As far as I managed to trace it, the method preprocess in the Example class populates the field posTags inn the field languageInfo by applying a LanguageAnalyzer. I'd like to know how it is determined which POS tags are actually added here and if it is possible to change (add/remove) POS tags manually? thanks!

ppasupat commented 3 years ago

The types of POS tags depend on the LanguageAnalyzer used (specified by the flag --LanguageAnalyzer.languageAnalyzer). The available choices are SimpleAnalyzer (which is a dummy analyzer that only recognizes numbers) and corenlp.CoreNLPAnalyzer (which runs CoreNLP).

The POS tags returned by CoreNLP are the based on the model used, with the default using Penn Treebank tags. The class CoreNLPAnalyzer postprocesses some of the tags (e.g., marking auxiliary verbs), and more postprocessing could be hacked in there.

stbusch commented 3 years ago

Thanks. I will think over if I need this feature. It's probably more relevant for database queries than for the cases the JavaExecutor handles.

stbusch commented 3 years ago

Shouldn't it be possible to map Strings to categories, e.g.

(rule $City (string "hamburg") (IdentityFn))
(rule $City (string "stuttgart") (IdentityFn))
(rule $City (string "munic") (IdentityFn))
(rule $ROOT ($City) (IdentityFn))

as a part/base of a more complex grammar?

In my application, using these rules, these city names aren't recognized. Is the grammar itself wrong? Otherwise it would mean, my application doesn't handle the grammar file correctly. Thanks

ppasupat commented 3 years ago

Could you try changing the rules to look like:

(rule $City (hamburg) (IdentityFn))
stbusch commented 3 years ago

I had tried this; I then get:

   ERROR: Composition failed: rule = $City -> hamburg (IdentityFn), children = []
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-374" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
        at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165)
        at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:224)
        at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:231)
        at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:123)
        at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98)
        at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170)....

(rest of the error message is referring to classes in my application) What apparently works is: (rule $City (hamburg) (ConstantFn (string hamburg))) but that feels quite cumbersome.

stbusch commented 3 years ago

The current grammar looks like this:

(rule $Dir (from) (ConstantFn (lambda x (call + (string "place of departure") (string :)(var x)))))
(rule $From ($Dir $City) (JoinFn forward))

(rule $Dir (to) (ConstantFn (lambda x (call + (string "arrival location") (string :)(var x)))))
(rule $To ($Dir $City) (JoinFn forward))

(rule $Part (by) (ConstantFn (lambda x (call + (string "transportation") (string :)(var x)))))
(rule $By ($Part $Transport) (JoinFn forward))

(rule $Transport (plane) (ConstantFn (string plane)))
(rule $Transport (train) (ConstantFn (string train)))

(rule $City (hamburg) (ConstantFn (string hamburg)))
(rule $City (stuttgart) (ConstantFn (string stuttgart)))
(rule $City (frankfurt) (ConstantFn (string frankfurt)))
(rule $City (munic) (ConstantFn (string munic)))
(rule $City (hannover) (ConstantFn (string hannover)))

(rule $ROOT ($From) (IdentityFn))
(rule $ROOT ($To) (IdentityFn))
(rule $ROOT ($By) (IdentityFn))

and manages to parse e.g. "from munic" into "(string "place of departure:munic")

ppasupat commented 3 years ago
stbusch commented 3 years ago

Thanks. With your hint, I have the grammar now set up that way:

(rule $Dir (from) (ConstantFn (lambda x (call + (string "from") (string ": ")(var x)))))
(rule $From ($Dir $City) (JoinFn forward))

(rule $Dir (to) (ConstantFn (lambda x (call + (string "to") (string ": ")(var x)))))
(rule $To ($Dir $City) (JoinFn forward))

(rule $Part (by) (ConstantFn (lambda x (call + (string "by") (string ": ")(var x)))))
(rule $By ($Part $Transport) (JoinFn forward))

(rule $Transport (plane) (ConstantFn (string plane)))
(rule $Transport (train) (ConstantFn (string train)))

(rule $City (hamburg) (ConstantFn (string hamburg)))
(rule $City (stuttgart) (ConstantFn (string stuttgart)))
(rule $City (frankfurt) (ConstantFn (string frankfurt)))
(rule $City (munic) (ConstantFn (string munic)))
(rule $City (hannover) (ConstantFn (string hannover)))

(rule $ROOT ($From) (IdentityFn))
(rule $ROOT ($To) (IdentityFn))
(rule $ROOT ($By) (IdentityFn))

(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))
(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y (call + (var x) (string ", ") (var y)))))

(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $By ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $To ($PHRASE optional) $From ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $To ($PHRASE optional) $By ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $From ($PHRASE optional) $By ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $To ($PHRASE optional) $From ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)
(rule $ROOT (($PHRASE optional) $By ($PHRASE optional) $From ($PHRASE optional) $To ($PHRASE optional)) (lambda x (lambda y ( lambda z (call + (var x) (string ", ") (var y) (string ", ") (var z)))))
)

The idea is to extract the main info from a passenger's utterance, who wants to order a ticket. There is still room for misunderstanding, e.g. "....preferably not by train.... " would still be parsed as "by train", but I'm planning to take care of this by sempre's learning capabilities. About learning: The handleCommandmethod in the Master class contains this sequence:

if (command.equals("accept") || command.equals("a")) {
        ex.setTargetFormula(response.getDerivation().getFormula());
        ex.setTargetValue(response.getDerivation().getValue());
        ex.setContext(session.getContextExcludingLast());
        addNewExample(ex);
      }

The format created this way apparently can't be read with the Dataset - inPaths option (calling the readFromPathPairs method). Of course I could change the format of the exported examples and re-use them, but it seems that is not the way intended by you sempre authors. So to re-use learning achievements in future sessions, should I go by reimporting the trained parameters? Thanks!

ppasupat commented 3 years ago

Importing the trained parameters is the intended way. The params are saved in the experiment directory after training in the normal batch learning mode (where you supply the dataset files).

From the interactive prompt mode (what you get when using the -interactive flag), after parsing a few examples and accepting a few candidates, you can also use the (params FILENAME_HERE) command to dump the current parameters to a file.

stbusch commented 3 years ago

Thanks. In your tutorial, you use the example

What is three plus four times two?

and point out:

There should be two derivations, yielding (number 14) and (number 11), corresponding to either combining three plus four first or four times two first. Note that this is expected because we have not encoded any order of operations anywhere.

Now please assume that I'd like sempre to learn the order of operations (multiplication having precedence over addition). What would be a recommendable feature representation here? The Rule feature would probably not be sufficient, because that would represent just the number of rule applications but not their order.

stbusch commented 3 years ago

...or maybe the Rule Feature could work with more sophisticated rules that can distinguish between e.g. sums of products and products of sums so that learning would have to encourage sums of products....? No clear plan yet. Inspiration would be appreciated.

ppasupat commented 3 years ago

To learn the order or operations, I think a more complex feature needs to be defined (e.g., depth of the derivation tree, or a binary feature indicating if the "add" rule was used in any child of the "multiple" rule). Such an ambiguity doesn't often occur in factoid questions, so there aren't features dealing with it.

stbusch commented 3 years ago

For some of the intended use cases I'd like to create and export training examples (instead of params) in a format that makes them reusable for training. The training example from your tutorial had this format:

 (example
   (utterance "three and four")
   (targetValue (number 7))
  )

If I have several examples in the same file, are there additional rules for correct fromatting to make them readable? E.g. seperators between the examples? Would this format be ok?

(example
  (utterance "three times six plus nine")
  (targetValue (number 27))
)

(example
  (utterance "three times five plus seven")
  (targetValue (number 22))
)

(example
  (utterance "three times seven plus eight")
  (targetValue (number 29))
)
ppasupat commented 3 years ago

The additional spaces should be fine. Comments can also be added: begin the comment with #.

stbusch commented 3 years ago

Thanks for this confirmation. I'm back at the earlier example of a passenger ordering a ticket I would like to have the parsing result as a list of strings, e.g.

(list (string hamburg) (string munic) (string plane))

or

(list (string "from hamburg") (string "to munic") (string "by plane"))

Is there a way (a function) to write rules that create a list from non-list type of values?

stbusch commented 3 years ago

Probably I should try to write a sempre function that does this, but am a bit out of practice when it comes to coding. Would it have to look approximately like this:

public class ListFn extends SemanticFn {
private ListValue listValue;
public ListFn (ListValue listValue){this.listValue=listValue;};
    @Override
    public DerivationStream call(Example ex, Callable c) {
        return new SingleDerivationStream() {
            @Override
            public Derivation createDerivation() {

                for (int i = 0; i < c.getChildren().size(); i++) {
              listValue.values.add(c.getChildren().get(i).getValue());
                }
                return new Derivation.Builder()
                        .withCallable(c)
                        .formula(new ValueFormula<>(listValue))
                        .createDerivation();
            }
        };
    }
}

? Also, would I have to add this class to the sempre code (locally), or is there a way sempre could use it from within my application? So far, my application could successfully use all sempre classes it needed, but when I try to use a grammar that contains rules with this new function, I get a ClassNotFoundException. Thanks for any help.

ppasupat commented 3 years ago

The code you have looks about right. New SemanticFn should be added to the SEMPRE code locally. SEMPRE uses reflection to resolve the name of the SemanticFn, so the SemanticFn has to be inside the SEMPRE library.

stbusch commented 3 years ago

Thanks. I'm back at trying to get a better understanding of the Features you listed earlier. Probably for simple subjects like the ticket order I won't need them all, but still: What does the skipPos feature represent?

ppasupat commented 3 years ago

Looks like skipPos is defined here. It's only defined on SelectFn, which selects a certain child as the output. It seems like skipPos looks at the children that were not chosen, and combine the part-of-speech of the tokens under them.

Not sure how helpful this feature is in general. This seems pretty task-specific.

stbusch commented 3 years ago

Thanks. I'm trying to apply a grammar using the ListFn class (I had posted before) in my application. While grammars just using functions that you had included in sempre do work fine, the compiler in this case gives:

... Caused by: java.lang.InstantiationException: edu.stanford.nlp.sempre.ListFn ... Caused by: java.lang.NoSuchMethodException: edu.stanford.nlp.sempre.ListFn.()

So I have to add the init method. I tried to understand its purpose by looking at its usage in the other SemanticFn subclasses: I changed my code to this:


public class ListFn extends edu.stanford.nlp.sempre.SemanticFn {
private List list;
public ListFn(List list) {
    this.list = list;
}

;

@Override
public DerivationStream call(Example ex, Callable c) {
    return new SingleDerivationStream() {
        @Override
        public edu.stanford.nlp.sempre.Derivation createDerivation() {
            return new Derivation.Builder()
                    .withCallable(c)
                    .formula(new ValueFormula<>(new ListValue(list)))
                    .createDerivation();
        }
    };
}

public void init(LispTree tree) {
    super.init(tree);
    list = new ArrayList();
    for (int i = 0; i < tree.children.size(); i++) {
        list.add(tree.child(i).value);
    }
}

}


Does this look reasonable? Thanks
ppasupat commented 3 years ago

This looks reasonable, though you should start the for loop (in init) from 1 instead of 0. The input tree looks like (ListFn foo bar), and you don't want to add "ListFn" to list.

stbusch commented 3 years ago

Thanks! I'm back a t the features. I'm afraid I could get an intuitive understanding of what they do/represent only for the rule and the constant feature. Is there an official desription for the others available in the way the rule feature was explained in the tutorial?

ppasupat commented 3 years ago

I don't think it's documented anywhere. When I was using SEMPRE (mostly the tables package) I defined my own features and didn't use any of the default ones. That said, I had some partial notes on what each feature does here:

stbusch commented 3 years ago

Thanks for explaining! I'm at the moment back at the learning algorithm.
When you said here: https://github.com/percyliang/sempre/issues/217#issuecomment-724443027

Define p(deriv) = exp[score(deriv)] / sum_deriv' exp[score(deriv')]

did you mean the "softmax function" https://en.wikipedia.org/wiki/Softmax_function ?

ppasupat commented 3 years ago

Yes, that's a softmax over the scores.

stbusch commented 2 years ago

Thanks.- In the internal handling of the machine learning algorithms, is there an any concept of negative training examples? Or does any result not "accepted" by the user implicitly count as negative?