テスト用SQLファイル群の用意

KagenoMoheji commented 4 months ago

概要

タイトル通り

実装方法（案があれば）

テーブルCREATE，プロシージャ，ビューの全て用意する
まずはFROMだけのシンプルなもの(level1)から用意する
後からレベルアップしてJOIN・サブクエリ・WITH句等のテスト用SQLを用意していく

備考

KagenoMoheji commented 4 months ago

PostgreSQLだけでなく，MySQL・SQLDBの構文でも用意した方がいいかな． node-sql-parserは下記対応してるっぽいけど． https://github.com/taozhi8833998/node-sql-parser

・Athena
・BigQuery
・DB2
・Hive
・MariaDB
・MySQL
・PostgresQL
・Redshift
・Sqlite
・TransactSQL
・[FlinkSQL](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sql/)
・Snowflake(alpha)
・[Noql](https://noql.synatic.dev/)

KagenoMoheji commented 4 months ago

想定するネットワーク図も，drawioか何かで作ってコミットしこう．

kappakun17 commented 4 months ago

@KagenoMoheji

・BigQuery ・MariaDB ・MySQL ・PostgresQL ・Sqlite

とりあえずまずは上記が使えればいいんちゃうかな

KagenoMoheji commented 3 months ago

@yuta0193 下記の感じかな

ddl-diagram-visualizer(これはレポジトリ)
　/ddl-diagram-visualizer
　　/<NodeJSアプリ>
　/test
　　/sqlfiles
　　　/mysql
　　　　/A.sql
　　　　/...
　　　/postgresql
　　　/...

yuta0193 commented 3 months ago

生成AIに複雑なSQL文を作成してもらい、パーサーでAST（抽象構文木）として出力させてみると、依存関係も解析できているみたいです！なので、ASTから依存関係など上手く抽出すれば、実現できそうです！

生成AIによるSQL文作成（解説あり） 以下は、PostgreSQLでの複雑なSQL文の例です。このクエリでは、複数のテーブルを結合し、サブクエリを使用して集計し、最終的にGROUP BY句を用いてデータをグループ化しています。

SELECT
    e.employee_name,
    d.department_name,
    SUM(s.sales_amount) AS total_sales
FROM
    employees e
INNER JOIN
    departments d ON e.department_id = d.department_id
INNER JOIN
    sales s ON e.employee_id = s.employee_id
WHERE
    s.sale_date BETWEEN '2023-01-01' AND '2023-12-31'
    AND e.employee_id IN (
        SELECT
            employee_id
        FROM
            sales
        GROUP BY
            employee_id
        HAVING
            SUM(sales_amount) > 100000
    )
GROUP BY
    e.employee_name,
    d.department_name
ORDER BY
    total_sales DESC;

このクエリは以下の処理を行います：

結合: employees, departments, sales テーブルを結合します。
サブクエリ: sales テーブルから年間の売上が100,000以上の従業員のIDを選択します。
集計: 従業員と部門ごとに売上総額を集計します。
グループ化: 従業員名と部門名でグループ化し、売上総額で降順に並べ替えます。

このSQL文は、特定の期間内での従業員と部門別の売上総額を求めるために使用することができます。

AST文（出力）

[
    {
        "with": null,
        "type": "select",
        "options": null,
        "distinct": {
            "type": null
        },
        "columns": [
            {
                "type": "expr",
                "expr": {
                    "type": "column_ref",
                    "table": "e",
                    "column": {
                        "expr": {
                            "type": "default",
                            "value": "employee_name"
                        }
                    }
                },
                "as": null
            },
            {
                "type": "expr",
                "expr": {
                    "type": "column_ref",
                    "table": "d",
                    "column": {
                        "expr": {
                            "type": "default",
                            "value": "department_name"
                        }
                    }
                },
                "as": null
            },
            {
                "type": "expr",
                "expr": {
                    "type": "aggr_func",
                    "name": "SUM",
                    "args": {
                        "expr": {
                            "type": "column_ref",
                            "table": "s",
                            "column": {
                                "expr": {
                                    "type": "default",
                                    "value": "sales_amount"
                                }
                            }
                        }
                    },
                    "over": null
                },
                "as": "total_sales"
            }
        ],
        "into": {
            "position": null
        },
        "from": [
            {
                "db": null,
                "table": "employees",
                "as": "e"
            },
            {
                "db": null,
                "table": "departments",
                "as": "d",
                "join": "INNER JOIN",
                "on": {
                    "type": "binary_expr",
                    "operator": "=",
                    "left": {
                        "type": "column_ref",
                        "table": "e",
                        "column": {
                            "expr": {
                                "type": "default",
                                "value": "department_id"
                            }
                        }
                    },
                    "right": {
                        "type": "column_ref",
                        "table": "d",
                        "column": {
                            "expr": {
                                "type": "default",
                                "value": "department_id"
                            }
                        }
                    }
                }
            },
            {
                "db": null,
                "table": "sales",
                "as": "s",
                "join": "INNER JOIN",
                "on": {
                    "type": "binary_expr",
                    "operator": "=",
                    "left": {
                        "type": "column_ref",
                        "table": "e",
                        "column": {
                            "expr": {
                                "type": "default",
                                "value": "employee_id"
                            }
                        }
                    },
                    "right": {
                        "type": "column_ref",
                        "table": "s",
                        "column": {
                            "expr": {
                                "type": "default",
                                "value": "employee_id"
                            }
                        }
                    }
                }
            }
        ],
        "where": {
            "type": "binary_expr",
            "operator": "AND",
            "left": {
                "type": "binary_expr",
                "operator": "BETWEEN",
                "left": {
                    "type": "column_ref",
                    "table": "s",
                    "column": {
                        "expr": {
                            "type": "default",
                            "value": "sale_date"
                        }
                    }
                },
                "right": {
                    "type": "expr_list",
                    "value": [
                        {
                            "type": "single_quote_string",
                            "value": "2023-01-01"
                        },
                        {
                            "type": "single_quote_string",
                            "value": "2023-12-31"
                        }
                    ]
                }
            },
            "right": {
                "type": "binary_expr",
                "operator": "IN",
                "left": {
                    "type": "column_ref",
                    "table": "e",
                    "column": {
                        "expr": {
                            "type": "default",
                            "value": "employee_id"
                        }
                    }
                },
                "right": {
                    "type": "expr_list",
                    "value": [
                        {
                            "tableList": [
                                "select::null::sales"
                            ],
                            "columnList": [
                                "select::employees::employee_name",
                                "select::departments::department_name",
                                "select::sales::sales_amount",
                                "select::employees::department_id",
                                "select::departments::department_id",
                                "select::employees::employee_id",
                                "select::sales::employee_id",
                                "select::sales::sale_date",
                                "select::null::employee_id",
                                "select::null::sales_amount"
                            ],
                            "ast": {
                                "with": null,
                                "type": "select",
                                "options": null,
                                "distinct": {
                                    "type": null
                                },
                                "columns": [
                                    {
                                        "type": "expr",
                                        "expr": {
                                            "type": "column_ref",
                                            "table": null,
                                            "column": {
                                                "expr": {
                                                    "type": "default",
                                                    "value": "employee_id"
                                                }
                                            }
                                        },
                                        "as": null
                                    }
                                ],
                                "into": {
                                    "position": null
                                },
                                "from": [
                                    {
                                        "db": null,
                                        "table": "sales",
                                        "as": null
                                    }
                                ],
                                "where": null,
                                "groupby": [
                                    {
                                        "type": "column_ref",
                                        "table": null,
                                        "column": {
                                            "expr": {
                                                "type": "default",
                                                "value": "employee_id"
                                            }
                                        }
                                    }
                                ],
                                "having": {
                                    "type": "binary_expr",
                                    "operator": ">",
                                    "left": {
                                        "type": "aggr_func",
                                        "name": "SUM",
                                        "args": {
                                            "expr": {
                                                "type": "column_ref",
                                                "table": null,
                                                "column": {
                                                    "expr": {
                                                        "type": "default",
                                                        "value": "sales_amount"
                                                    }
                                                }
                                            }
                                        },
                                        "over": null
                                    },
                                    "right": {
                                        "type": "number",
                                        "value": 100000
                                    }
                                },
                                "orderby": null,
                                "limit": {
                                    "seperator": "",
                                    "value": []
                                },
                                "window": null
                            }
                        }
                    ]
                }
            }
        },
        "groupby": [
            {
                "type": "column_ref",
                "table": "e",
                "column": {
                    "expr": {
                        "type": "default",
                        "value": "employee_name"
                    }
                }
            },
            {
                "type": "column_ref",
                "table": "d",
                "column": {
                    "expr": {
                        "type": "default",
                        "value": "department_name"
                    }
                }
            }
        ],
        "having": null,
        "orderby": [
            {
                "expr": {
                    "type": "column_ref",
                    "table": null,
                    "column": {
                        "expr": {
                            "type": "default",
                            "value": "total_sales"
                        }
                    }
                },
                "type": "DESC",
                "nulls": null
            }
        ],
        "limit": {
            "seperator": "",
            "value": []
        },
        "window": null
    }
]

kappakun17 commented 3 months ago

@yuta0193 早速ありがとう！しかしすごいなこれ、ここからデータを摘出しマッピングがかなり肝になってきそう。

ただ気になったのが、検証データがSQL文だけど、実際読み込んでもらうのは、DDLファイル群だから検証データがそもそも違うように思ったけど、どう？

geminiに聞いたところ、ゆうたのSQL文から想定されるDDLを用意してもらったよ。

CREATE TABLE employees (
  employee_id INT PRIMARY KEY,
  employee_name VARCHAR(255) NOT NULL,
  department_id INT
);

CREATE TABLE departments (
  department_id INT PRIMARY KEY,
  department_name VARCHAR(255) NOT NULL
);

CREATE TABLE sales (
  sale_id INT PRIMARY KEY,
  employee_id INT,
  sale_date DATE NOT NULL,
  sales_amount DECIMAL(10, 2) NOT NULL
);

-- 外部キー制約（省略可能）
ALTER TABLE employees ADD FOREIGN KEY (department_id) REFERENCES departments (department_id);
ALTER TABLE sales ADD FOREIGN KEY (employee_id) REFERENCES employees (employee_id);

後、複数の CREATE TABLEが1つのファイルにあった場合、データはすべて読み込んでくれるのかも知りたい！

以上を踏まえ、追加でテストしてほしいデータが

DDLデータ（複数のCreate table）
プロシージャ (複数のプロシージャ)
ビュー (複数のビュー)

パーサーでAST（抽象構文木）として出力させてみると

他にも、出力する方法があるってことかぬ？

kappakun17 commented 3 months ago

あ、ごめん、DDLは前に調べてくれてたね https://github.com/hello-deaf-world/ddl-diagram-visualizer/issues/6#issuecomment-2117927916

リストで出力になっているけど、それをAST出力することもできる感じかな？

yuta0193 commented 3 months ago

あ、ごめん、DDLは前に調べてくれてたね #6 (comment)

リストで出力になっているけど、それをAST出力することもできる感じかな？

イエス！ただ、上手く解析できない文（PostgreSQLのcreate）もあって、どこからどこまで対応できているかの調査が必要です…

KagenoMoheji commented 3 months ago

@kappakun17 @yuta0193 とりあえず1つのSQLファイルに1つのDDL(テーブル，ビュー，プロシージャ)が定義されている前提でファイル読み込みさせる方針でやってみない？

yuta0193 commented 3 months ago

@kappakun17 @yuta0193 とりあえず1つのSQLファイルに1つのDDL(テーブル，ビュー，プロシージャ)が定義されている前提でファイル読み込みさせる方針でやってみない？

そうですね！こちらのほうでファイル読み込みも実装していたのでちょうど良かったです！

ファイル群もありがとうございます！一応、こちらもネットで公開しているサンプルデータベースも見つけたのですが、最終テストとして使ってみてもいいかなと思ってます！

kappakun17 commented 3 months ago

@yuta0193

一応、こちらもネットで公開しているサンプルデータベースも見つけたのですが、最終テストとして使ってみてもいいかなと思ってます！

テストデータは多いに越したことないから、ぜひ使えそうなやつはどんどん入れて！

KagenoMoheji commented 3 months ago

@yuta0193 クローズしちゃったけどテストデータの用意は続行でOKす～

hello-deaf-world / ddl-diagram-visualizer

テスト用SQLファイル群の用意 #9

概要

実装方法（案があれば）

備考