semanticize / semanticizer

Entity Linking for the masses
http://semanticize.uva.nl/
GNU General Public License v3.0
56 stars 15 forks source link

make underlying datasource for wpm configurable #9

Closed evertlammerts closed 11 years ago

evertlammerts commented 11 years ago

We need different solutions for accessing the underlying wikipediaminer data. Only using in-process, in-memory data storage has a number of downsides:

The upside of keeping everything in-process, in-memory is speed - even though remote storage can also keep everything in memory, you'll always have an extra layer of indirection (i.e. request / response transport).

We should refactor the code to support multiple backends for the wpm dumps. The first refactoring to enable this has already been done in commit ce0d13f1c7ccdd21fab6ebcf12f9b52b7bfd8c25. We currently support:

All new storage drivers should inherit from wpm.base.Data and implement all functions (it's really an interface, but Python doesn't seem to support that in a nice enough way). The instance is created during runtime based on the configuration value wpmdatasource, which should be the classname of the implementation (e.g. wpm.wpmdata_inproc.WpmDataInProc).

The configuration module still needs to be adjusted to support a bit more flexible loading of different paramaters.

evertlammerts commented 11 years ago

Config module is ported to YAML. The data can now be loaded from any implementation of wpm.wpmdata_inproc.WpmDataInProc. Closing issue.