vesoft-inc / nebula-studio

NebulaGraph Web GUI Tools
Apache License 2.0
196 stars 62 forks source link

kg build with unstructured data, an attempt! #746

Open wey-gu opened 8 months ago

wey-gu commented 8 months ago

Input Data: paul_graham_essay.txt

Schema: (RDF Style, will try PropGraph Style later)

CREATE TAG `entity` `name` string;
CREATE EDGE `relationship` `name` string;

Log: all.log

Model: Azure OpenAI: 3.5-turbo 2023-07-01-preview


It's smooth! Thanks! Something we may consider change:

...
the knowledge graph has following schema and node name must be a real :
...
NodeType "entity" ("name":string )
EdgeType "relationship" ("name":string )
...
{userPrompt}
...

It seems in such rdf-style schema(as follows), with only one type of edge, and put edge type as edge.name, our current implementation is not guiding LLM to generate edge props. I guess we firstly address the typical Property Graph style modeling?

"""graph-schema
NodeType "entity" ("name":string )
EdgeType "relationship" ("name":string ) #<---------- name

"""
{userPrompt}
Return the results directly, without explain and comment. The results should be in the following JSON format:
{
  "nodes":[{ "name":string,"type":string,"props":object }],
  "edges":[{ "src":string,"dst":string,"edgeType":string,"props":object }] #<----------- there is no name here.
}

Thus, the extracted edges are w/o edge type(name prop):

INSERT EDGE `relationship` () VALUES "software as a service"->"Aspra":();
INSERT EDGE `relationship` () VALUES "still life"->"visual cues":();
INSERT EDGE `relationship` () VALUES "color changes"->"visual cues":();
INSERT EDGE `relationship` () VALUES "software"->"documents":();
mizy commented 8 months ago

feat: support parsing edge prop

{
  "nodes":[{ "name":string,"type":string,"props":object }],
  "edges":[{ "src":string,"dst":string,"edgeType":string,"props":object }] #<----------- there is no name here.
}

this prompt is just a example output for ensure the result format. the graph schema already inject to the prompt like NodeType "entity" ("name":string ) and some space may have many edge types & edge props so that to lead a very lang context. in my case the edge prop is generated successfuly

{
  "nodes": [
    { "name": "我", "type": "人物", "props": {"角色": "测试工程师"} },
    { "name": "Explorer项目", "type": "物品", "props": {"版本": ["v3.6.0", "v3.7.0"], "工作内容": ["测试工作", "提交issue", "设计并实施测试用例", "API自动化测试", "制定测试计划", "编写测试报告"] } },
    { "name": "Analytics项目", "type": "物品", "props": {"版本": ["v3.6.0"], "工作内容": ["迭代测试", "功能测试", "性能测试", "提交issue", "编写测试用例和测试报告"] } },
    { "name": "银行项目", "type": "物品", "props": {"工作内容": ["部署内部图计算测试集群", "数据开发", "构造风控业务场景", "编写并执行测试用例", "跟踪issue"] } },
    { "name": "confluence", "type": "物品", "props": {"用途": "记录测试用例"} },
    { "name": "cloud代码库", "type": "物品", "props": {"用途": "存储API自动化测试代码"} }
  ],
  "edges": [
    { "src": "我", "dst": "Explorer项目", "edgeType": "关系", "props": {"关系类型": "负责测试"} },
    { "src": "我", "dst": "Analytics项目", "edgeType": "关系", "props": {"关系类型": "负责测试"} },
    { "src": "我", "dst": "银行项目", "edgeType": "关系", "props": {"关系类型": "负责支持与测试"} },
    { "src": "我", "dst": "confluence", "edgeType": "关系", "props": {"关系类型": "在其中记录测试用例"} },
    { "src": "我", "dst": "cloud代码库", "edgeType": "关系", "props": {"关系类型": "在其中提交API自动化测试代码"} }
  ]
}
wey-gu commented 8 months ago

this prompt is just a example output for ensure the result format. the graph schema already inject to the prompt like NodeType "entity" ("name":string )

I see, we could tune the prompt to better address this, in my case(where I think the schema generate no obvious confusion) it failed in most cases:

> match ()-[e]->() RETURN e
+------------------------------------------------------------------------------------------------------+
| e                                                                                                    |
+------------------------------------------------------------------------------------------------------+
| [:relationship "teacher"->"grade" @0 {name: __NULL__}]                                               |
| [:relationship "essay question"->"Cezanne" @0 {name: __NULL__}]                                      |
| [:relationship "roommate"->"Robert" @0 {name: __NULL__}]                                             |
| [:relationship "Florence"->"Duomo" @0 {name: __NULL__}]                                              |
| [:relationship "Florence"->"Orsanmichele" @0 {name: __NULL__}]                                       |
| [:relationship "Florence"->"Pitti" @0 {name: __NULL__}]                                              |
| [:relationship "Florence"->"Via Ricasoli" @0 {name: __NULL__}]                                       |
| [:relationship "Florence"->"budget" @0 {name: __NULL__}]                                             |
| [:relationship "bedroom"->"night" @0 {name: __NULL__}]                                               |
| [:relationship "Lisp"->"AI" @0 {name: __NULL__}]                                                     |
| [:relationship "Lisp"->"C++" @0 {name: __NULL__}]                                                    |
| [:relationship "Lisp"->"John McCarthy" @0 {name: __NULL__}]                                          |
| [:relationship "Lisp"->"Lisp hacker" @0 {name: __NULL__}]                                            |
| [:relationship "Lisp"->"McCarthy" @0 {name: __NULL__}]                                               |
| [:relationship "Lisp"->"On Lisp" @0 {name: __NULL__}]                                                |
| [:relationship "Lisp"->"Platonic form of Lisp" @0 {name: __NULL__}]                                  |
| [:relationship "Lisp"->"Turing machine" @0 {name: __NULL__}]                                         |
| [:relationship "Lisp"->"book" @0 {name: __NULL__}]                                                   |
| [:relationship "Lisp"->"expression" @0 {name: __NULL__}]                                             |
| [:relationship "Lisp"->"shot" @0 {name: __NULL__}]                                                   |
| [:relationship "comment"->"angry people" @0 {name: __NULL__}]                                        |
| [:relationship "RISD"->"HTML" @0 {name: __NULL__}]                                                   |
| [:relationship "RISD"->"Providence" @0 {name: __NULL__}]                                             |
| [:relationship "RISD"->"art" @0 {name: __NULL__}]                                                    |
| [:relationship "RISD"->"job" @0 {name: __NULL__}]                                                    |
| [:relationship "RISD"->"money" @0 {name: __NULL__}]                                                  |
| [:relationship "experts"->"talks" @0 {name: __NULL__}]                                               |
| [:relationship "run"->"air conditioners" @0 {name: __NULL__}]                                        |
| [:relationship "Arc"->"interpreter" @0 {name: __NULL__}]                                             |
| [:relationship "brain"->"feature" @0 {name: __NULL__}]                                               |
wey-gu commented 8 months ago

I will look into the prompt and tune it to improve this in a PR then.

mizy commented 8 months ago

I will look into the prompt and tune it to improve this in a PR then.

ok, maybe some space is not work well with this prompt

wey-gu commented 8 months ago

I proposed a changed default prompt via https://github.com/vesoft-inc/nebula-studio/pull/751, kindly help review that :D.