Closed samehkamaleldin closed 8 years ago
Use the create matrices
operation, instead of train and test
. Note that you'll also have to include "output": { "output matrices": true }
at the same level as the "operation": { ... }
parameters.
Alternatively, if you want to generate the features directly in scala/java code, instead of going through a file, you can just instantiate the feature generators, using something like this:
val params = new SpecFileReader("/dev/null").readSpecFile(specFile)
// Here we need to set up some stuff to use the SFE code. The first few things don't matter, but
// we'll use the graph and feature generators in the code below.
val praBase = "/dev/null"
val relation = "relation doesn't matter"
val outputter = Outputter.justLogger
val relationMetadata = new RelationMetadata(JNothing, praBase, outputter, fileUtil)
val graph = Graph.create(params \ "graph", praBase, outputter, fileUtil).get
val nodeFeatureGenerator = new NodeSubgraphFeatureGenerator(
params \ "node features",
relation,
relationMetadata,
outputter,
fileUtil
)
val nodePairFeatureGenerator = new NodePairSubgraphFeatureGenerator(
params \ "node pair features",
relation,
relationMetadata,
outputter,
fileUtil
)
Then, to actually get features for a particular node pair, you do something like this:
println(s"Computing feature vector for entity pair ($mid1, $mid2)")
val instance = new NodePairInstance(graph.getNodeIndex(mid1), graph.getNodeIndex(mid2), true, graph)
val subgraph = nodePairFeatureGenerator.getLocalSubgraph(instance)
val features = nodePairFeatureGenerator.extractFeaturesAsStrings(instance, subgraph)
You'd probably have to change a few things in that code, but that's the general idea.
I did manage to extract the features using the configuration file approach building on the sfe_bfs_pra_anyrel.json
configuration file.
Assuming that we've aGraphOnDisk
object graph
and one relation in this graph relationName
how to programatically extract SFE feature set for this specific relation ?
Would it be like the above example with only putting the relation name ?
Yes, it would be like the example above. The only reason to actually use the relation name is if the relation you're trying to predict is already in the graph - in that case, you need to hide it from the learning during training, or you will learn features that don't work at test time.
Hi! First of all, @matt-gardner , thanks for the great work. @samehkamaleldin , thanks for sharing your question.
The goal of this post is to simply clarify what the columns in the output files (test_matrix.tsv
and train_matrix.tsv
) mean. For what I understand:
subject,object
pair for that respective triple. Subject and object are separated by a comma.-#-
and consists of a path,feature_value
pair, where:
path
represents a sequence of relations andfeature_value
is the feature value extracted by the algorithm (which should be always 1 if SFE is used, see reply below).Here is an example of the output from running SFE in the NELL dataset (basically modifying sfe_bfs_pra_anyrel.json
in the way that @samehkamaleldin suggested above), for the relation concept:actorstarredinmovie
:
subject,object | label | features (paths) |
---|---|---|
concept:comedian:john_belushi,concept:movie:blues_brothers |
1 |
ANYREL:-_@ALIAS@-@ANY_REL@-@ALIAS@-,1.0 -#- ANYREL:-_@ALIAS@-started-@ANY_REL@-,1.0 -#- -_@ALIAS@-started-@ALIAS@-,1.0 -#- ANYREL:-_@ANY_REL@-started-@ALIAS@-,1.0 |
Parsing the example above in the way that I think it should be, we would end up with the following values:
concept:comedian:john_belushi
concept:movie:blues_brothers
1
(the triple exists in the training set)ANYREL:-_@ALIAS@-@ANY_REL@-@ALIAS@-
with feature value 1.0
ANYREL:-_@ALIAS@-started-@ANY_REL@-
with feature value 1.0
-_@ALIAS@-started-@ALIAS@-
with feature value 1.0
ANYREL:-_@ANY_REL@-started-@ALIAS@-
with feature value 1.0
Please let me know if everything is correct. I hope that this helps future people interested in extracting the features to understand the meaning of the output tables faster. Thanks for your attention and support.
Yep, that's all correct! The only thing I would change is instead of "confidence" I would say "feature value". In SFE, all features are indicator features, with value 1, while in PRA, the feature value is a random walk probability.
Thanks @matt-gardner ! I edited the post with your suggestion.
Is there any documentation about how to use SFE as a library to extract SFE features into a file. without doing any learning or prediction.