Open oneilsh opened 3 weeks ago
@bschilder - while going through this higher-level API adjustment, I am also wondering if it's worth doing some function renaming to help with clarity.
Not sure how I feel about these yet, but just some ideas: *_engine
could be *_kg
, seach_kg
could just be search
(I didn't want to override the default non-generic in utils
, but then I see other packages do so maybe not a big deal? Then again, if _engine
becomes _kg
then search_kg
fits well). Meanwhile, fetch_nodes
is only for engines (there's no query graph we are expanding outward from), while fetch_edges
requires a graph to expand on, but the naming doesn't make that super clear. Maybe like fetch_kg_nodes
and fetch_more
?
Awesome to see progress on this!
i like the idea of renaming to the suffix *_kg
, seems more intuitive to me and less characters to write.
Regarding search
, people do often overwrite these but I find this confusing (there's ambiguity as to which one you're calling when reading a script) and the messages that come up every time you load the package are annoying.
Another option might be to include an argument within fetch_nodes
to expand the fetch to additional nodes that are a certain graph distance away: eg. 2-hop, 3-hop etc.
In the meantime, I'm going to merge your changes into my fork to test it out and give more specific feedback.
It would be nice to support both Neo4j-backed KGs and .tar.gz files hosted at https://kghub.io/ using a similar interface, but with different "engines" providing common features. For example:
However, functions such as
fetch_nodes()
should return graphs, not engines, meaning subsequent queries would need to be provided an engine to use:For a data-first approach, graphs can additionally carry with them the engine that generated them (or the most recent engine, in a multi-engine setting which should be feasible):
By abstracting engines, we can also define a
monarch_engine
as a special case ofneo4j_engine
, and provide different implementations of features likesearch
via the Monarch API.I've made progress on this over in the
engines
branch, and am working through an MVP set of functions for the different engines:constructors:
search_kg(engine)
Keyword/string searchfetch_nodes(engine)
Fetch nodes by ID list or property combinationfetch_edges(graph)
Given a query graph, fetch more edges from the KG neighborhood (expand)summarize_neighborhood(graph)
Summarize the surrounding neighborhood in the KG without fetching itEdit: I'll make a separate item for documenting as it's currently lagging the code:
inherits
,rdname
, etc.