marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Add SQL database to graph conversion tool (Db2Graph) #99

Closed ryansun117 closed 2 years ago

ryansun117 commented 2 years ago

Introducing a new feature to Marius: Db2Graph, a SQL database to graph conversion tool. Db2Graph converts relational databases into graphs as sets of triples which can be used as input datasets for Marius, allowing streamlined preprocessing from database to Marius.

Db2Graph is contained in Marius but can be used as a standalone tool. Db2Graph currently supports graph conversion from three relational database management systems: MySQL, MariaDB, and PostgreSQL. Conversion with Db2Graph is achieved in the following steps:

  1. Users import/create the database locally
  2. Users define the configuration file and entity/edge SQL SELECT queries
  3. Db2Graph executes the SQL SELECT queries
  4. Db2Graph transforms the result set of queries into sets of triples

This pull request adds the source file src/python/tools/db2graph/db2graph.py and a documentation page docs/db2graph/db2graph.rst which describes the requirements, definitions, and steps for using Db2Graph, and a real example use case.

Testing is provided using pytest and GitHub actions to validate the correctness of the db2graph functions.