Closed samehkamaleldin closed 8 years ago
You can do this with ExperimentRunner
. You just need to create a split, either by using the known instances of the relation, or by sampling negatives for an already-existing train/test split. You can see examples of both of these here - look at the create_*_split.json
files. If you need help after looking at those, let me know. You can also try to look at the SplitCreator code to find out what parameters are available and what they do.
I'm currently working on fb15k
dataset, and using PRA
as dependency Jar file, trying to extract the negative samples for any specific relation using personalised page rank.
The fb15k
dataset is not a good one to use for experiments with PRA / SFE. For a large majority of the training and the test instances, the inverse of the relation to be predicted is in the training data. This leads to poor parameter learning, and also to ridiculously high accuracies. I've actually been running this experiment myself the last few days, and I can confirm that SFE gets perfect or nearly perfect accuracy on most of the relations.
If you want to try running this yourself, this is the experiment spec that I used (you'll need to add quotes back in):
{
graph: {
name: fb15k,
relation sets: [
{
is kb: false,
relation file: /home/mattg/data/freebase/fb15k_train.tsv
},
{
is kb: false,
relation file: /home/mattg/data/freebase/fb15k_valid.tsv
}
]
},
split: {
type: add negatives to split,
from split: fb15k,
name: fb15k_with_negatives,
relation metadata: freebase,
graph: fb15k,
negative instances: {
negative to positive ratio: 10
}
},
operation: {
type: train and test
learning: {
max training examples: 5000
}
}
}
You'll also need to generate the original fb15k split (with just the positive instances) from this script.
Even if I'm working on FB15K
it's not currently my main goal, I'm trying to build dataset independent code, and what I'm trying is not to use the json
files as input, I'm doing this grammatically, with a fixed set of configuration, And I'm trying to know how to go if you have a GraphOnDisk
object and a specific relation ship that you want to work on.
Like for example the nell_with_negative
splits that's available on EMNLP 2015 paper website, if I want to regenerate them how that can be possible ?
If you want to run a knowledge base completion task programmatically, you'll basically be re-writing all of the functionality that I have already written. If you want to generate a split in your own code, look at where splits are created in the Driver
.
I did manage to do it re-visiting your code, Thanks a lot
Using the repository as a
jar
file, How can anyone generates a sample of negative examples of triples for a specific relationship usingpersonalised page rank algorithm
?It seems like this class
PprNegativeExampleSelector
is the one that do the job, but I cannot figure the most proper way to use for extracting negative examples.