yago-naga / yago3

YAGO is a large semantic knowledge base, derived from Wikipedia, WordNet, WikiData, GeoNames, and other data sources
https://yago-knowledge.org/downloads/yago-3
GNU General Public License v3.0
729 stars 85 forks source link

Subgraph #9

Closed hoffart closed 6 years ago

hoffart commented 6 years ago

I extended YAGO to be able to produce a subgraph. The output is like the original YAGO, but contains only a facts for a restricted set of entities.

Usage

  1. Specify restricted entities:

Add the following parameters to the yago.ini (both take comma-seprated strings as values): subgraphEntities=, subgraphClasses=

  1. Run YAGO, which produces (in the final yago files) only facts for the given entities (union of all subgraphEntities and entities belonging in subgraphClasses).

What was changed

The magic happens in the TypeChecker, which in addition to just checking domain and range now also checks for containedIn(subgraph).

The crucial part was to change the Type extraction, as subgraphClasses only works after the initial transitive type hierarchy has been specified.

Please have a look if this makes sense :)

thomasrebele commented 6 years ago

I saw that you removed multilingual extraction from fromThemes.CategoryClassHierarchyExtractor. Have you checked whether this removes a lot of facts? Otherwise it looks good to me.

hoffart commented 6 years ago

The CategoryClassHierarchyExtractor actually never was multilingual, I think this was an oversight when initially implementing this together with Felix Keller.

hoffart commented 6 years ago

Thanks for having a look, I will merge then :)