matt-gardner / pra

122 stars 42 forks source link

Extract SFE features to file #13

Closed samehkamaleldin closed 8 years ago

samehkamaleldin commented 8 years ago

Is there any documentation about how to use SFE as a library to extract SFE features into a file. without doing any learning or prediction.

matt-gardner commented 8 years ago

Use the create matrices operation, instead of train and test. Note that you'll also have to include "output": { "output matrices": true } at the same level as the "operation": { ... } parameters.

Alternatively, if you want to generate the features directly in scala/java code, instead of going through a file, you can just instantiate the feature generators, using something like this:

val params = new SpecFileReader("/dev/null").readSpecFile(specFile)

// Here we need to set up some stuff to use the SFE code.  The first few things don't matter, but
// we'll use the graph and feature generators in the code below.
val praBase = "/dev/null"
val relation = "relation doesn't matter"
val outputter = Outputter.justLogger
val relationMetadata = new RelationMetadata(JNothing, praBase, outputter, fileUtil)
val graph = Graph.create(params \ "graph", praBase, outputter, fileUtil).get
val nodeFeatureGenerator = new NodeSubgraphFeatureGenerator(
  params \ "node features",
  relation,
  relationMetadata,
  outputter,
  fileUtil
)
val nodePairFeatureGenerator = new NodePairSubgraphFeatureGenerator(
  params \ "node pair features",
  relation,
  relationMetadata,
  outputter,
  fileUtil
)

Then, to actually get features for a particular node pair, you do something like this:

println(s"Computing feature vector for entity pair ($mid1, $mid2)")
val instance = new NodePairInstance(graph.getNodeIndex(mid1), graph.getNodeIndex(mid2), true, graph)
val subgraph = nodePairFeatureGenerator.getLocalSubgraph(instance)
val features = nodePairFeatureGenerator.extractFeaturesAsStrings(instance, subgraph)

You'd probably have to change a few things in that code, but that's the general idea.

samehkamaleldin commented 8 years ago

I did manage to extract the features using the configuration file approach building on the sfe_bfs_pra_anyrel.json configuration file.

samehkamaleldin commented 8 years ago

Assuming that we've aGraphOnDisk object graph and one relation in this graph relationName how to programatically extract SFE feature set for this specific relation ?

Would it be like the above example with only putting the relation name ?

matt-gardner commented 8 years ago

Yes, it would be like the example above. The only reason to actually use the relation name is if the relation you're trying to predict is already in the graph - in that case, you need to hide it from the learning during training, or you will learn features that don't work at test time.

arthurcgusmao commented 6 years ago

Hi! First of all, @matt-gardner , thanks for the great work. @samehkamaleldin , thanks for sharing your question.

The goal of this post is to simply clarify what the columns in the output files (test_matrix.tsv and train_matrix.tsv) mean. For what I understand:

Here is an example of the output from running SFE in the NELL dataset (basically modifying sfe_bfs_pra_anyrel.json in the way that @samehkamaleldin suggested above), for the relation concept:actorstarredinmovie:

subject,object label features (paths)
concept:comedian:john_belushi,concept:movie:blues_brothers 1 ANYREL:-_@ALIAS@-@ANY_REL@-@ALIAS@-,1.0 -#- ANYREL:-_@ALIAS@-started-@ANY_REL@-,1.0 -#- -_@ALIAS@-started-@ALIAS@-,1.0 -#- ANYREL:-_@ANY_REL@-started-@ALIAS@-,1.0

Parsing the example above in the way that I think it should be, we would end up with the following values:

Please let me know if everything is correct. I hope that this helps future people interested in extracting the features to understand the meaning of the output tables faster. Thanks for your attention and support.

matt-gardner commented 6 years ago

Yep, that's all correct! The only thing I would change is instead of "confidence" I would say "feature value". In SFE, all features are indicator features, with value 1, while in PRA, the feature value is a random walk probability.

arthurcgusmao commented 6 years ago

Thanks @matt-gardner ! I edited the post with your suggestion.