matt-gardner / pra

122 stars 42 forks source link

Build Status Coverage Status

PRA (Path Ranking Algorithm) and SFE (Subgraph Feature Extraction)

PRA and SFE are algorithms that extract feature matrices from graphs, and use those feature matrices to do link prediction in that graph. This repository contains implementations of PRA and SFE, as used in the following papers (among others):

To reproduce the experiments in those papers, see the corresponding website. Note that the EMNLP 2015 paper has the most detailed instructions, and the older papers use versions of the code that aren't compatible with the current repository; if you really want to reproduce the older experiments, talk to me.

See the github.io page for code documentation. Please feel free to file bugs, feature requests, or send pull requests.

If the Travis CI badge above says that the build is failing, just click on the badge, go to Build History, and find the most recent commit with a passing build, and check that one out to use the code. Or, if you're using this as a library, just specify the most recent released version in your library dependencies with sbt or mvn, and it should be based on a passing build (see the changelog below for the most recent version).

NOTE

This code generally takes quite a bit of memory. That's probably a byproduct of how it was developed; I typically use a machine that has 400GB of RAM, so I don't need to worry too much about how much memory the code is using. That means I probably do some things that are memory inefficient; on NELL graphs, the code can easily use upwards of 40GB. On larger graphs, and with various parameter settings, it can easily use much more than that. With small graphs, though, I can successfully run the code on a machine that only has 8GB of RAM. This needs some work to be made more memory efficient on larger graphs. It should be straightforward to implement a stochastic gradient training regime, for instance, that would allow for much more memory-efficient computation.

License

This code makes use of a number of other libraries that are distributed under various open source licenses (notably the Apache License and the Common Public License). You can see those dependencies listed in the build.sbt file. The code under the src/ directory is distributed under the terms of the GNU General Public License, version 3 (or, at your choosing, any later version of that license). You can find the text of that license here.

Changelog

Version 3.4 (released on 8/26/2016):

Version 3.3 (released on 3/22/2016):

Version 3.2 (released on 1/21/2016):

Version 3.1 (released on 11/9/2015):

Version 3.0 (released on 5/30/2015):

Version 2.0 (released on 3/4/2015):

Version 1.1 (released on 12/20/2014):