matt-gardner / pra

122 stars 42 forks source link

Generate negative examples #15

Closed samehkamaleldin closed 8 years ago

samehkamaleldin commented 8 years ago

Using the repository as a jar file, How can anyone generates a sample of negative examples of triples for a specific relationship using personalised page rank algorithm ?

It seems like this class PprNegativeExampleSelector is the one that do the job, but I cannot figure the most proper way to use for extracting negative examples.

matt-gardner commented 8 years ago

You can do this with ExperimentRunner. You just need to create a split, either by using the known instances of the relation, or by sampling negatives for an already-existing train/test split. You can see examples of both of these here - look at the create_*_split.json files. If you need help after looking at those, let me know. You can also try to look at the SplitCreator code to find out what parameters are available and what they do.

samehkamaleldin commented 8 years ago

I'm currently working on fb15k dataset, and using PRA as dependency Jar file, trying to extract the negative samples for any specific relation using personalised page rank.

matt-gardner commented 8 years ago

The fb15k dataset is not a good one to use for experiments with PRA / SFE. For a large majority of the training and the test instances, the inverse of the relation to be predicted is in the training data. This leads to poor parameter learning, and also to ridiculously high accuracies. I've actually been running this experiment myself the last few days, and I can confirm that SFE gets perfect or nearly perfect accuracy on most of the relations.

If you want to try running this yourself, this is the experiment spec that I used (you'll need to add quotes back in):

{                                                               
  graph: {                                                      
    name: fb15k,                                                
    relation sets: [                                            
      {                                                         
        is kb: false,                                           
        relation file: /home/mattg/data/freebase/fb15k_train.tsv
      },                                                        
      {                                                         
        is kb: false,                                           
        relation file: /home/mattg/data/freebase/fb15k_valid.tsv
      }                                                         
    ]                                                           
  },                                                            
  split: {                                                      
    type: add negatives to split,                               
    from split: fb15k,                                          
    name: fb15k_with_negatives,                                 
    relation metadata: freebase,                                
    graph: fb15k,                                               
    negative instances: {                                       
      negative to positive ratio: 10                            
    }                                                           
  },                                                            
  operation: {                                                  
    type: train and test
    learning: {
      max training examples: 5000
    }                                        
  }                                                             
}                                                               

You'll also need to generate the original fb15k split (with just the positive instances) from this script.

samehkamaleldin commented 8 years ago

Even if I'm working on FB15K it's not currently my main goal, I'm trying to build dataset independent code, and what I'm trying is not to use the json files as input, I'm doing this grammatically, with a fixed set of configuration, And I'm trying to know how to go if you have a GraphOnDisk object and a specific relation ship that you want to work on.

Like for example the nell_with_negative splits that's available on EMNLP 2015 paper website, if I want to regenerate them how that can be possible ?

matt-gardner commented 8 years ago

If you want to run a knowledge base completion task programmatically, you'll basically be re-writing all of the functionality that I have already written. If you want to generate a split in your own code, look at where splits are created in the Driver.

samehkamaleldin commented 8 years ago

I did manage to do it re-visiting your code, Thanks a lot