memgraph / mage

MAGE - Memgraph Advanced Graph Extensions :crystal_ball:
Apache License 2.0
255 stars 26 forks source link

[main < T225] Add LLM util #225

Closed katarinasupe closed 1 year ago

katarinasupe commented 1 year ago

Description

A module that contains procedures describing graphs in a format best suited for large language models (LLMs).

Example usage: Get raw graph schema: CALL llm_util.schema('raw') YIELD schema RETURN schema; Get prompt-ready graph schema: CALL llm_util.schema() YIELD schema RETURN schema; or CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema;

TODO:

Changelog message: Now you can generate graph schema in a format best suited for large language models (LLMs).

Pull request type

######################################

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

######################################

Josipmrden commented 1 year ago

@katarinasupe check https://www.notion.so/memgraph/Workflow-e99f310ca5ec463a95a3b5594c04aac6 for workflow on naming of branches.

In your case [main < T225] Add LLM util

Josipmrden commented 1 year ago

can we get any information by reusing the results from meta_util.schema? or we need to do the process of extracting the schema again manually?

katarinasupe commented 1 year ago

@Josipmrden I could get some info from meta util. My first approach was to edit meta_util and see what I can do, but that took more time than implementing a new module. Meta_util is also counting properties and how many nodes have those properties and that is an overkill for this module. To call meta_util from this module is, in my opinion, also an overkill. To get properties that exist, counts need to be done too.

katarinasupe commented 1 year ago

I updated the module, restructured a code a bit and applied @Josipmrden and Brett's suggestions. Here are the screenshots of new usage. Screenshot 2023-06-19 at 13 22 09 Screenshot 2023-06-19 at 13 22 24 Screenshot 2023-06-19 at 13 22 40

Here are my comments and I also refer to Brett's review here:

katarinasupe commented 1 year ago

I tested it on Europe gas pipelines dataset from Memgraph Lab (which does not have a pretty schema). europe-gas-pipelines-scigrid-model

Here is the output:

Node properties are the following:
Node name: 'NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}]
Node name: 'NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}]
Node name: 'COMPRESSOR:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'NODE:PRODUCTION', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_production_M_m3_per_d', 'type': 'float'}]
Node name: 'ENTRY_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}]
Node name: 'BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'int'}]
Node name: 'BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'COMPRESSOR:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'COMPRESSOR:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}]
Node name: 'BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}]
Node name: 'LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'ENTRY_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}]
Node name: 'ENTRY_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]

Relationship properties are the following:
Relationship Name: 'PIPE', Relationship Properties: [{'property': 'name', 'type': 'str'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'int'}, {'property': 'pipe_id', 'type': 'str'}, {'property': 'diameter_mm', 'type': 'int'}, {'property': 'length_km', 'type': 'float'}, {'property': 'num_compressor', 'type': 'int'}]

The relationships are the following:
['(:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE:PRODUCTION)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:NODE:PRODUCTION)']
['(:NODE:PRODUCTION)-[:PIPE]->(:NODE)']
['(:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:ENTRY_POINT:LNG:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:COMPRESSOR:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:LNG:NODE)-[:PIPE]->(:NODE)']
['(:LNG:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:LNG:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:LNG:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:NODE)-[:PIPE]->(:LNG:NODE)']
brettdbrewer commented 1 year ago
katarinasupe commented 1 year ago

Thank you @brettdbrewer for the comments, I will check the ones on Discord too and copy anything useful to reply here.

katarinasupe commented 1 year ago

@brettdbrewer raised concerns regarding the capitalization of properties. Once we discuss this, tolower() might be added to all string properties.

brettdbrewer commented 1 year ago

I don't think tolower() will be needed as the issue is with property values, not property/node/relationship names. Said another way, the issue isn't with the schema definition, it is in the Cypher query generated by the LLM. E.g. for the query "Who was killed by magic?" in the GoT dataset, the LLM would need to create the following Cypher query "MATCH (killer:Character)-[k:KILLED]->(victim:Character) WHERE toLower(k.method) = 'magic' RETURN victim.name" to get the right answer since "Magic" is capitalized in the dataset for that relationship property. I think we can divorce this issue from the schema definition and address it as a solution problem any developer will have to (potentially) solve in their solution.

katarinasupe commented 1 year ago

I added a 'fix' for multiple labels according to @brettdbrewer's suggestion. Here is what it looks like for the above example now: Screenshot 2023-06-27 at 13 59 56 Screenshot 2023-06-27 at 14 00 10 Screenshot 2023-06-27 at 14 00 16

Besides that, I restructured the code a bit to follow the MAGE codebase and improved the docstring according to the documentation. I will update the documentation based on these changes.