Open zachwood0s opened 2 years ago
Thank you for your detailed question. I have not gotten a chance to look at your code in detail yet; I'll respond in more detail once I do.
You are correct that you should not need to define witness functions for uses of built-ins like Kth
and Filter
. The framework provides witness functions for them for some Spec
types. Specifically, I see for Kth
, ExampleSpec
and DisjunctiveExamplesSpec
witness functions and for Filter
, SubsequenceSpec
, PrefixSpec
, and ExampleSpec
(but no DisjunctiveExamplesSpec
... I assume there's some reason for that, but I'm not sure what it is).
Also, for debugging such issues, if you haven't already been doing so, it helps a lot to look at the synthesis log. When constructing a SynthesisEngine.Config
you can use the LogListener
field to pass in a LogListener
object and call .SaveLogToXML()
after the learn to get an XML log of all of the witness functions with the arguments they were called with and their return values.
Thanks for the quick response! I did not know about the LogListener
but that will definitely come in handy when debugging! Also, I just realized that my latest commit in my witness functions includes an empty witness function for MatchTag
on parameter 0. I was using that mostly for testing to see what values it would try so you can ignore that.
On another note, is there a way to view the .XML log that is generated so that it fills in the ##
references with the actual value at the top?
I added in the LogListener
and thought the logs that were generated could be helpful. I noticed that with the current DSL, Kth
does attempt to use a DisjunctiveSubsequenceSpec
but it looks like the only MatchNodes
that it attempts to learn is MatchNodes(True(), Descendants(tree))
. I'm assuming this is because MatchTag
fails to select a tag
parameter that works (similar to what I was saying in my initial post).
The log that I'm pulling this from can be found here: https://github.com/zachwood0s/dom_program_synthesis/blob/main/synthesis_log_match_true.xml
Line 2157 seems to be the beginning of the Kth
operations.
Line 2247 seems to be where it attempts to learn the MatchNodes
rule again but only shows True()
as the selected predicate.
Let me know if you need any more information!
No, the ##
references are separate in part to avoid problems with circular references and the log getting too big due to writing out the same value many times. You could try writing your own script to copy them, but it's probably easier to just have a couple separate windows open viewing the same XML file. Sorry, I know those logs are awkward to read, but it's somewhat unavoidable because they are very information-dense.
Also, I'm sorry, but I'm very busy at the moment, and while I can answer questions, I will not have time to actually take an in-depth look at your code this week.
No, the
##
references are separate in part to avoid problems with circular references and the log getting too big due to writing out the same value many times.
Ah ok that makes sense, I'll just pull up multiple windows.
Also, I'm sorry, but I'm very busy at the moment, and while I can answer questions, I will not have time to actually take an in-depth look at your code this week.
That's perfectly fine! I hope you don't mind if I continue to bounce questions off of you while I attempt to solve this though.
I've been digging into the log more and it looks like this may be caused by Filter
or the lambda expression not accepting a DisjunctiveExamplesSpec
but I'm not entirely sure. It appears to start down the correct path (using Kth
) but doesn't attempt to learn any of the predicates other than True()
. This can be seen in the log below. This is where it starts the KthDisjunctiveLearner
:
<UsePlan elapsed="00:00:00.0161113" verify="false" hasDependencies="true" plan="["KthDisjunctiveLearner.WitnessSequence", "KthDisjunctiveLearner.WitnessK"]">
<Witness index="0" elapsed="00:00:00.0029815" param="rule" function="KthDisjunctiveLearner.WitnessSequence">
<witness>
<DisjunctiveSubsequenceSpec examples="2">
...
</DisjunctiveSubsequenceSpec>
And here is where it attempts to learn MatchNodes
inside of the KthDisjunctiveLearner
but it looks like the _LFun0
learning doesn't learn the predicate like I'd expect it to:
<LearnRule elapsed="00:00:00.0027615" rule="MatchNodes" programCount="1">
<result>
<Direct size="1" symbol="rule">
<Program i="0"><![CDATA[MatchNodes(True(), Descendants(tree))]]></Program>
</Direct>
</result>
<UsePlan elapsed="00:00:00.0005979" verify="true" hasDependencies="false" plan="["GrammarRule.WitnessAll", "GrammarRule.WitnessAll"]">
<Witness index="0" elapsed="00:00:00.0000027" param="_LFun0" function="GrammarRule.WitnessAll">
<witness>
<TopSpec examples="0" />
</witness>
</Witness>
<Learn symbol="_LFun0" k="all" elapsed="00:00:00.0000184" randomK="null" retrievedFromCache="false" depth="6">
<result>
<Join rule="_LFun0 := \x => match" size="1">
<Param name="match">
<Join rule="True" size="1" />
</Param>
</Join>
</result>
<spec>
<TopSpec examples="0" />
</spec>
</Learn>
<Witness index="1" elapsed="00:00:00.0000023" param="nodes" function="GrammarRule.WitnessAll">
<witness>
<TopSpec examples="0" />
</witness>
</Witness>
<Learn symbol="nodes" k="all" elapsed="00:00:00.0005651" randomK="null" retrievedFromCache="false" depth="6">
<result>
<Direct size="3" symbol="nodes">
<Program i="0"><![CDATA[Children(tree)]]></Program>
<Program i="1"><![CDATA[Descendants(tree)]]></Program>
<Program i="2"><![CDATA[Single(tree)]]></Program>
</Direct>
</result>
<spec>
<TopSpec examples="0" />
</spec>
</Learn>
</UsePlan>
</LearnRule>
I looked at the log for when MatchTag
does succeed and it looks like the FilterWitnessingPlanExamples.WitnessSet
gets called in the successful cases but not in the case above.
<LearnRule programCount="1" elapsed="00:00:00.1012893" rule="MatchNodes">
...
<UsePlan verify="false" elapsed="00:00:00.1003024" plan="["FilterWitnessingPlanExamples.WitnessPredicate", "FilterWitnessingPlanExamples.WitnessSet"]" hasDependencies="true">
<Witness index="1" elapsed="00:00:00.0034527" param="nodes" function="FilterWitnessingPlanExamples.WitnessSet">
... And so on
Do you think this is caused by Filter
or lambda functions not supporting the DisjunctiveExampleSpec
?
Hi PROSE team! I've been working on creating a DSL for web scraping and it currently supports various tree/list operations to find data in a DOM tree. For selecting/filtering elements in my "node lists" I'm currently using the built-in functions
Kth
andFilter
. My current DSL is shown below:This is then able to generate programs like
Single(SelectChild(Descendents(tree), 1))
andMatchNodes(MatchTag(x, "div"), Descendants(tree))
but it fails when trying to select the Kth element out of a filtered list. For something like "Give the first node with the tag of 'div'" I'm hoping to generate something like:Single(SelectChild(MatchNodes(MatchTag(x, "div"), Descendants(tree)), 0))
I currently don't have witness functions for either
SelectChild
orMatchNodes
as I wasn't sure if I needed to write them for built-in functions (there was some mention that the framework could more aggressively optimize the built-ins so I didn't want to interfere with that). I have noticed that it gets close when calling the witness function forMatchTag
but the incoming args are never really satisfiable by the predicate. For example, if I was searching for the first "div" node,Filter
will clearly be attempting to filter the correct list that contains that "div" node, butMatchNode
will get examples like:Which, with the way I have
MatchTag
implemented, won't contain a single tag that satisfies all the examples. It seems like for this to work,MatchTag
would need to receive the examples:and have
SelectChild
select only the first of the filtered list, but I'm not sure what I need to change to make that happen.The tree synthesis portion of the repo can be found here if you'd like to take a look: https://github.com/zachwood0s/dom_program_synthesis/tree/main/ProseTutorial/tree_synthesis
Any help or suggestions on how to get this interaction to work would be greatly appreciated!