starling-lab / BoostSRL

BoostSRL: "Boosting for Statistical Relational Learning." A gradient-boosting based approach for learning different types of SRL models.
https://starling.utdallas.edu
GNU General Public License v3.0
32 stars 21 forks source link

NullPointerException during learning mode. #34

Open monicasenapati opened 4 years ago

monicasenapati commented 4 years ago

I have the training files all set according to the requirements, including the positive and negative training examples. I use the command : java -jar BoostSRL.jar -l -combine -mln -mlnClause -train train/ -target malicious -mlnClauseLen 10 However, I keep encountering the following error. Any insight on it would be really helpful.

% The best node found: null

% No acceptable clause was learned on this cycle of the ILP inner loop (LearnOneClause). % The closest-to-acceptable node found (score = -Infinity): % null

% **

% Have stopped ILP's outer loop because have reached the maximum number of iterations (20).

% ** adding regression values Exception in thread "main" java.lang.NullPointerException at edu.wisc.cs.will.ILP.ILPouterLoop.produceFinalTheory(ILPouterLoop.java:1616) at edu.wisc.cs.will.ILP.ILPouterLoop.executeOuterLoop(ILPouterLoop.java:1093) at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.getWILLTree(LearnBoostedRDN.java:396) at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.learnRDN(LearnBoostedRDN.java:234) at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.learnNextModel(LearnBoostedRDN.java:129) at edu.wisc.cs.will.Boosting.MLN.RunBoostedMLN.learn(RunBoostedMLN.java:147) at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.learnModel(RunBoostedModels.java:77) at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.runJob(RunBoostedModels.java:54) at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.main(RunBoostedModels.java:220)

harshakokel commented 4 years ago

Hello Monica,

Can you share a sample dataset for which you see this error? It will help us replicate the issue and debug.

Thanks, Harsha Kokel.

monicasenapati commented 4 years ago

Hi Harsha,

Thank you for getting back to me. I don’t have a backup of the background file, since I have been constantly modifying it, doubting it was the background.txt file that is not properly structured. However, the training and testing data remain the same. Attached is the compressed folder containing folders “train” and “test”, as well as the background.txt file on Twitter data. The null error is not there anymore. But I continue to get values 0 for precision and recall, and subsequently NaN for F1 score. I appreciate any help in this manner.

In the lines with “precompute” in background.txt, it was an attempt to only include the predicates “friendsCount” and “followersCount” above a certain threshold. Please let me know if you need any additional information.

Thanks & regards, Monica

On Feb 18, 2020, at 6:16 PM, Harsha notifications@github.com<mailto:notifications@github.com> wrote:

Hello Monica,

Can you share a sample dataset for which you see this error. It will help us replicate the issue and debug.

Thanks, Harsha Kokel.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/starling-lab/BoostSRL/issues/34?email_source=notifications&email_token=AFCVJGNYNQROXDWTR5FDL2TRDR27PA5CNFSM4KUFPHAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMF3LEA#issuecomment-587969936, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFCVJGP7MLHL3KCZKQATQQLRDR27PANCNFSM4KUFPHAA.

monicasenapati commented 4 years ago

tweets_sample_data.zip Attached the sample data, just in case, it wasn't attached properly through email.

nandhiniramanan5 commented 4 years ago

I have fixed your background file. I have attached the new data set to this email. I have done training and I can see that it learns now. Please let us know if you need anything else from our end.

Following are the mistakes with your data:

  1. Incorrect mode. You had declared the incorrect argument type: mode: containsLink(+tweetID, -tweetID).

  2. Precomputes incorrectly declared. Precomputes had an error in the syntax and were not getting generated while running. Have fixed them and they should appear in the RRT you learn.

  3. I have rewritten your background, facts and pos and neg file and standardized the notations as with Prolog notations and set usePrologVariables: true.

I have set exhaustive modes. you can set it better given you have better domain knowledge. Also use the following command: java -jar BoostSRL.jar -l -train train/ -target malicious -i -trees x tweets_sample_data.zip

monicasenapati commented 4 years ago

Thank you so much for taking the time to look into this issue and provide a corrected background file. However, instead of trees, we would like to use MLNs for learning and inferencing using the below command: java -jar BoostSRL.jar -l -train train/ -target malicious -mln -mlnClause -numMLNClause 8 -mlnClauseLen 5 Please let me know if that will be okay to use.

Also, could you please provide a link to the resources on the standardized Prolog notations? For example, in the background file: has_more_than_n_friends(A,N) :- friendscount(A,N2), member(N,[40000]), N2 > N. The notation "member" has been used. I would like to get a better understanding on such notations to be better able to format the background file for our dataset.

monicasenapati commented 4 years ago

Hi, Using the command: java -jar /mydata/BoostSRL/BoostSRL.jar -l -combine -mln -mlnClause -numMLNClause 13 -mlnClauseLen 5 -train /mydata/BoostSRL/train/ -target malicious Yet I face the following error: Exception in thread "main" java.lang.NullPointerException at edu.wisc.cs.will.ILP.ILPouterLoop.produceFinalTheory(ILPouterLoop.java:1616) at edu.wisc.cs.will.ILP.ILPouterLoop.executeOuterLoop(ILPouterLoop.java:1093) at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.getWILLTree(LearnBoostedRDN.java:396) at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.learnRDN(LearnBoostedRDN.java:234) at edu.wisc.cs.will.Boosting.RDN.LearnBoostedRDN.learnNextModel(LearnBoostedRDN.java:129) at edu.wisc.cs.will.Boosting.MLN.RunBoostedMLN.learn(RunBoostedMLN.java:147) at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.learnModel(RunBoostedModels.java:77) at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.runJob(RunBoostedModels.java:54) at edu.wisc.cs.will.Boosting.Common.RunBoostedModels.main(RunBoostedModels.java:220)

Could you please give me some insights on why this occurs?