starling-lab / BoostSRL

BoostSRL: "Boosting for Statistical Relational Learning." A gradient-boosting based approach for learning different types of SRL models.
https://starling.utdallas.edu
GNU General Public License v3.0
32 stars 21 forks source link

Performing parameter learning #19

Open rodrigoazs opened 6 years ago

rodrigoazs commented 6 years ago

Hello,

I'd like to learn the regression values for a given tree. In order to do that I'm trying to force the code to select the node I want as the best one for the split when they look through some candidates. However, I don't know how the candidate nodes passed as parameter List children in the addChildrenToOpenList method (BestFirstSearch.java, line 25) are chosen. The nodes in the children list changes when the code is ran again. What is random in this process of selecting nodes to split the tree?

In addition, let's say I want the first split to be _professor(B), student(A), publication(C,A). How can I create a SingleClauseNode object to represent it and where am I supposed to define it as the bestNode?

The tree learned changes in different runs.

UW-CSE dataset: First run

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( taughtby(C, B), tempadvisedby(D, B), publication(E, D) )
%   | then if ( publication(F, A), publication(F, B) )
%   | | then return 0.8581489350995122;  // std dev = 0.000, 10.000 (wgt'ed) examples reached here.  /* #pos=10 */
%   | | else return 0.02481560176617886;  // std dev = 0.373, 12.000 (wgt'ed) examples reached here.  /* #neg=10 #pos=2 */
%   | else if ( publication(G, B), publication(G, A) )
%   | | then return 0.8268989350995116;  // std dev = 0.174, 32.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=31 */
%   | | else if ( publication(H, A), publication(I, B) )
%   | | | then if ( ta(J, A), publication(I, K), ta(L, K) )
%   | | | | then return 0.8581489350995122;  // std dev = 0.000, 5.000 (wgt'ed) examples reached here.  /* #pos=5 */
%   | | | | else if ( tempadvisedby(M, B), publication(H, N), professor(N) )
%   | | | | | then return -0.14185106490048777;  // std dev = 0.000, 3.000 (wgt'ed) examples reached here.  /* #neg=3 */
%   | | | | | else if ( tempadvisedby(P, B) )
%   | | | | | | then return 0.6081489350995122;  // std dev = 0.866, 4.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=3 */
%   | | | | | | else return 0.15814893509951225;  // std dev = 0.458, 10.000 (wgt'ed) examples reached here.  /* #neg=7 #pos=3 */
%   | | | else return 0.6663681131817044;  // std dev = 0.394, 73.000 (wgt'ed) examples reached here.  /* #neg=14 #pos=59 */
%   else return -0.1418510649004883;  // std dev = 0.000, 220.000 (wgt'ed) examples reached here.  /* #neg=220 */

Second run

% FOR advisedby(A, B):
%   if ( hasposition(B, C), student(A) )
%   then if ( publication(D, A), publication(D, B) )
%   | then return 0.8116373071925351;  // std dev = 0.211, 43.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=41 */
%   | else if ( publication(E, A), publication(E, F), professor(F) )
%   | | then if ( publication(G, B), tempadvisedby(H, B) )
%   | | | then return -0.05851773156715445;  // std dev = 0.276, 12.000 (wgt'ed) examples reached here.  /* #neg=11 #pos=1 */
%   | | | else if ( taughtby(I, B), taughtby(I, F) )
%   | | | | then return 0.10814893509951219;  // std dev = 0.866, 4.000 (wgt'ed) examples reached here.  /* #neg=3 #pos=1 */
%   | | | | else if ( ta(J, A) )
%   | | | | | then return 0.6581489350995122;  // std dev = 0.894, 5.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=4 */
%   | | | | | else if ( tempadvisedby(K, B), tempadvisedby(L, F), publication(M, L) )
%   | | | | | | then return 0.5248156017661788;  // std dev = 0.816, 3.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=2 */
%   | | | | | | else return 0.2581489350995122;  // std dev = 1.095, 5.000 (wgt'ed) examples reached here.  /* #neg=3 #pos=2 */
%   | | else if ( taughtby(N, B), ta(N, A) )
%   | | | then return 0.8581489350995123;  // std dev = 0.000, 16.000 (wgt'ed) examples reached here.  /* #pos=16 */
%   | | | else return 0.6208607995062918;  // std dev = 0.425, 59.000 (wgt'ed) examples reached here.  /* #neg=14 #pos=45 */
%   else return -0.13653191596431785;  // std dev = 0.073, 188.000 (wgt'ed) examples reached here.  /* #neg=187 #pos=1 */

Thank you, Best regards.

mayukhdas commented 6 years ago

Hi Rodrigo,

Allow us some time to look into this. We will get back to you with an explanation as soon as possible.

Thanks.

rodrigoazs commented 6 years ago

Hello,

I have done some modifications in the ILPouterLoop.java in order to force creating nodes and leaves in specific places and create the same structure of a given tree. I have done that by creating SingleClauseNodes and allowing the creation of interior nodes and leaves.

It seems that the code is working, however I am getting very different standard deviations in the WILL regression tree file produced. The number of reached examples and the regression values are very similar.

Any idea what could it be?

Learning a single tree

%%%%%  WILL-Produced Tree #1 @ 16:26:35 7/9/18.  [Using 3,379,008 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8581489350995117;  // std dev = 1.79e-07, 29.000 (wgt'ed) examples reached here.  /* #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.8581489350995123;  // std dev = 0.000, 19.000 (wgt'ed) examples reached here.  /* #pos=19 */
%   | | else return 0.46000078695136487;  // std dev = 0.490, 108.000 (wgt'ed) examples reached here.  /* #neg=43 #pos=65 */
%   else return -0.14185106490048802;  // std dev = 0.000, 167.000 (wgt'ed) examples reached here.  /* #neg=167 */

Learning parameters for the previous tree: First run

%%%%%  WILL-Produced Tree #1 @ 16:49:03 7/9/18.  [Using 3,310,648 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8248156017661784;  // std dev = 0.983, 30.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.762910839861417;  // std dev = 1.345, 21.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=19 */
%   | | else return 0.5214142412219619;  // std dev = 4.678, 98.000 (wgt'ed) examples reached here.  /* #neg=33 #pos=65 */
%   else return -0.14185106490048813;  // std dev = 0.000, 194.000 (wgt'ed) examples reached here.  /* #neg=194 */

Second run

%%%%%  WILL-Produced Tree #1 @ 11:15:20 7/10/18.  [Using 3,102,320 memory cells.]  %%%%%

% FOR advisedby(A, B):
%   if ( professor(B), student(A) )
%   then if ( tempadvisedby(C, B), publication(D, A), publication(D, B) )
%   | then return 0.8248156017661784;  // std dev = 0.983, 30.000 (wgt'ed) examples reached here.  /* #neg=1 #pos=29 */
%   | else if ( taughtby(E, B), ta(E, A) )
%   | | then return 0.762910839861417;  // std dev = 1.345, 21.000 (wgt'ed) examples reached here.  /* #neg=2 #pos=19 */
%   | | else return 0.5081489350995129;  // std dev = 4.770, 100.000 (wgt'ed) examples reached here.  /* #neg=35 #pos=65 */
%   else return -0.1418510649004882;  // std dev = 0.000, 211.000 (wgt'ed) examples reached here.  /* #neg=211 */

Thanks for helping, Best regards.

mayukhdas commented 6 years ago

Hi Rodrigo,

Sorry for the delay. Also apologies for being confused about the question. So this code samples data for every run. That is why the SDs are different.

However, I may have understood your question/concern wrong. If that is so, please let me know. I will try to get to the bottom of this.

Thanks Mayukh

rodrigoazs commented 6 years ago

Hi Mayukh,

I'm very grateful for your help. Actually I am concerned about why the SDs values are so different comparing to the regular learning code. As I said previously, I'm intending to implement Parameter Learning. To do that, I'm forcing the code to provide the clause I want as the Best Node found. I'm also forcing the code to branch leaves or interior nodes in the same structure of a given tree so that I can learn new regression values.

The way I'm doing that follows the steps below:

The first block of code is a WILL-Produced Tree learnt from scratch (using the original code). After learning this tree I forced my code to generate the same one (generating the same nodes in each level and branch). When I run this, I obtain similar regression values as you can see comparing block 1 with block 2 and block 1 with block 3. For every run it samples new data (I have provided the same train_neg and train_post files), that's why regression values, SDs and number of reached examples are different. However the SDs comparing from the original learning code with mine are very different. It's about ten times greater.

The clause advisedby(A, B) :- professor(B), student(A), ! has a original SD value of 0.490 for 43 negative examples and 65 positive examples, with my code it has a value of 4.678 with 35 negative examples and 65 positive ones.

Also, the clause advisedby(A, B) :- professor(B), student(A),tempadvisedby(C, B),publication(D, A),publication(D, B), ! has a original SD of 1.79e-07 with only 29 positive examples reached. My code presented a SD of 0.983 with only 1 more negative example reached.

The regression values and number of reached examples seems OK, but the standard deviations are very different.

It seems it's not impacting when I test my model. The model file just have the clauses and regression values, but I think there is something wrong in the way I'm creating theses SingleClauseNodes.

If you are still confusing, please let me know.

Thank you for your patience :) Rodrigo

mayukhdas commented 6 years ago

Hi Rodrigo,

I understand your concern now. Is it possible to send the java file(s) with your changes so that we can take a look at them and try to figure out the impact of those changes? If you have a forked branch of the repository let us know we can also try to look at it directly instead of you attaching java code.

Thanks Mayukh

rodrigoazs commented 6 years ago

Hi Mayukh,

I appreciate your help. My code is a little messy and it is not on a forked branch yet. I will provide good comments and push it to a forked branch so that you can take a look. Please, give me a couple days to do that.

In addition, I'm a little confused about how this weighted variance is calculated. Do you have something that could help me to understand that? I thought branches that have standard deviations closer to 0 were more likely to be good branches. But it seems that this assumption is not true.

I learnt a single tree (original learning code) in the Yago2s Database for the playsfor target and I've got this result:

%%%%% WILL-Produced Tree #1 @ 18:28:26 7/16/18. [Using 1.015.432.728 memory cells.] %%%%%
% FOR playsfor(A, B):
% if ( isaffiliatedto(A, B) )
% then return 0.8578097747059199; // std dev = 9,694, 277.155,000 (wgt'ed) examples reached here. /* #neg=94 #pos=277.061 */
% else return -0.14184745438877816; // std dev = 1,000, 276.969,000 (wgt'ed) examples reached here. /* #neg=276.968 #pos=1 */

For about 277.000 examples for each branch, the majority is either positive or negative. The AUC ROC is 0.999704.

Thank you, Rodrigo.

mayukhdas commented 6 years ago

Hi Rodrigo @rodrigoazs,

If we could get a code snippet of your customization, it would be great. We tried but somehow are unable to replicate your scenario. Even if you do not have a forked repository send us the the snippet of your customized java class(es). We can try to integrate that into the current code and try to see why the standard deviations are different.

I understand it might be awkward to paste entire java files in comment, so just send me an email if that is easy for you.

Thanks -- Mayukh

rodrigoazs commented 6 years ago

Hi @mayukhdas,

I just sent you an email with the customized code. About my last question (Yago2s), the tree in this scenario was obtained through the original BoostSRL code. I do not know why this happens in the original code and also in my customized one. I can also provide exactly the Yago2s train and test sets I have used if you like.

Thanks.

mayukhdas commented 6 years ago

Hey Rodrigo,

Thanks a lot, I will look into that code and get back to you as soon as possible.

Thanks Mayukh