Closed lostmygithubaccount closed 7 months ago
pre-filter, post-filter?
[ins] In [1]: bot.query("what's the ibis how-to guide for working with altiar?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1802 request_id=3334f23cf945cde67b160318a1245525 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1457 request_id=10f99cefccf8191d2802bb6d79e8c5fe response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=2050 request_id=7eeb7a437caa44e16a95170b0fe08ae8 response_code=200
Out[1]: {'docs': ['data/docs/ibis_docs/how-to/visualization/altair.qmd']}
[ins] In [2]: bot.query("tutorial docs on ibis")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1630 request_id=dd01fdec420b29428d554c54ce3035da response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=884 request_id=74fa746be871e757d51c7be3ca529481 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=6589 request_id=c4c05799f07f131ac702645bcbfee75e response_code=200
Out[2]:
{'docs': ['data/docs/ibis_docs/tutorials/ibis-for-dplyr-users.qmd',
'data/docs/ibis_docs/tutorials/getting_started.qmd',
'data/docs/ibis_docs/tutorials/ibis-for-pandas-users.qmd',
'data/docs/ibis_docs/tutorials/ibis-for-sql-users.qmd',
'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/1_basics.qmd',
'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/0_setup.qmd']}
same w/ GPT-3.5:
[ins] In [1]: bot.query("what's the ibis how-to guide for working with altiar?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1769 request_id=c7b56b76b46c0097474007ac9bf77d1c response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=287 request_id=65cc3ad7f1d2105374e26f939eca0949 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=867 request_id=7b6bf533e21dbedf97a7445ac29d63ce response_code=200
Out[1]: {'docs': ['data/docs/ibis_docs/how-to/visualization/altair.qmd']}
[ins] In [2]: bot.query("tutorial docs on ibis")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=800 request_id=2404a1ca22b1a4fd952af4d847c6c8ba response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=388 request_id=b09c00c70d58f1440fd6ce55ad3f3084 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=2273 request_id=bd527044a7853ed50a67c0ce61388c89 response_code=200
Out[2]:
{'docs': ['data/docs/ibis_docs/tutorials/ibis-for-dplyr-users.qmd',
'data/docs/ibis_docs/tutorials/getting_started.qmd',
'data/docs/ibis_docs/tutorials/ibis-for-pandas-users.qmd',
'data/docs/ibis_docs/tutorials/ibis-for-sql-users.qmd',
'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/1_basics.qmd',
'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/0_setup.qmd']}
[ins] In [1]: bot.query("what's the ibis_tpc doc on query 1?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1633 request_id=50bf4f4952f88b521274953e096027b6 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1049 request_id=a2de1894e5ad591d42560d9cb3170dd4 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1232 request_id=b9abf4430eae5a9108f7b982e8c6d80c response_code=200
Out[1]: {'docs': ['data/docs/ibis_tpc/h01.py']}
[ins] In [2]: bot.query("what's the birdbrain doc that explains why it exists?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1490 request_id=97f23ceaad77ef19f239cf2414e1daff response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1299 request_id=9227d4af01eb8e129d9dae53ac27d0c9 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1132 request_id=109fdd6ed4e0d288f6a6adb9d0e3e948 response_code=200
Out[2]: {'docs': ['data/docs/birdbrain_docs/why.qmd']}
[ins] In [1]: bot.query("the first few rows in the lineitems table of the tpch data")
...:
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1375 request_id=1b05c603732a2f9fb1c3e9af2ffd26db response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=218 request_id=e8ad11db91a557eaf3fb538c82490282 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=450 request_id=0c70ffeb67fdc144ae33250ce87c6ca2 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1261 request_id=303a88ea472087b98d46db018f63a158 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=621 request_id=d7a9ec403b03272c4f581fdbe4ea1356 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1398 request_id=45d78af4a86c2c6b41c8ba80d9a51e7e response_code=200
INFO:root:tables: {'lineitem': ibis.Schema {
l_orderkey !int32
l_partkey !int32
l_suppkey !int32
l_linenumber !int32
l_quantity !decimal(15, 2)
l_extendedprice !decimal(15, 2)
l_discount !decimal(15, 2)
l_tax !decimal(15, 2)
l_returnflag !string
l_linestatus !string
l_shipdate !date
l_commitdate !date
l_receiptdate !date
l_shipinstruct !string
l_shipmode !string
l_comment !string
}}
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1890 request_id=4f8513f21a0331755371d0be10e60053 response_code=200
INFO:root:sql: SELECT * FROM lineitem LIMIT 10
Out[1]:
{'tpch.main': ┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━┓
┃ l_orderkey ┃ l_partkey ┃ l_suppkey ┃ l_linenumber ┃ l_quantity ┃ … ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━┩
│ int32 │ int32 │ int32 │ int32 │ decimal(15, 2) │ … │
├────────────┼───────────┼───────────┼──────────────┼────────────────┼───┤
│ 1 │ 155190 │ 7706 │ 1 │ 17.00 │ … │
│ 1 │ 67310 │ 7311 │ 2 │ 36.00 │ … │
│ 1 │ 63700 │ 3701 │ 3 │ 8.00 │ … │
│ 1 │ 2132 │ 4633 │ 4 │ 28.00 │ … │
│ 1 │ 24027 │ 1534 │ 5 │ 24.00 │ … │
│ 1 │ 15635 │ 638 │ 6 │ 32.00 │ … │
│ 2 │ 106170 │ 1191 │ 1 │ 38.00 │ … │
│ 3 │ 4297 │ 1798 │ 1 │ 45.00 │ … │
│ 3 │ 19036 │ 6540 │ 2 │ 49.00 │ … │
│ 3 │ 128449 │ 3474 │ 3 │ 27.00 │ … │
└────────────┴───────────┴───────────┴──────────────┴────────────────┴───┘}
[ins] In [1]: r = bot("all the rows in the lineitems table in the tpch data")
WARNING:root:sql: SELECT * FROM lineitem
[ins] In [2]: t = r["tpch.main"]
[ins] In [3]: t
Out[3]:
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━┓
┃ l_orderkey ┃ l_partkey ┃ l_suppkey ┃ l_linenumber ┃ l_quantity ┃ … ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━┩
│ int32 │ int32 │ int32 │ int32 │ decimal(15, 2) │ … │
├────────────┼───────────┼───────────┼──────────────┼────────────────┼───┤
│ 1 │ 155190 │ 7706 │ 1 │ 17.00 │ … │
│ 1 │ 67310 │ 7311 │ 2 │ 36.00 │ … │
│ 1 │ 63700 │ 3701 │ 3 │ 8.00 │ … │
│ 1 │ 2132 │ 4633 │ 4 │ 28.00 │ … │
│ 1 │ 24027 │ 1534 │ 5 │ 24.00 │ … │
│ 1 │ 15635 │ 638 │ 6 │ 32.00 │ … │
│ 2 │ 106170 │ 1191 │ 1 │ 38.00 │ … │
│ 3 │ 4297 │ 1798 │ 1 │ 45.00 │ … │
│ 3 │ 19036 │ 6540 │ 2 │ 49.00 │ … │
│ 3 │ 128449 │ 3474 │ 3 │ 27.00 │ … │
│ 3 │ 29380 │ 1883 │ 4 │ 2.00 │ … │
│ 3 │ 183095 │ 650 │ 5 │ 28.00 │ … │
│ 3 │ 62143 │ 9662 │ 6 │ 26.00 │ … │
│ 4 │ 88035 │ 5560 │ 1 │ 30.00 │ … │
│ 5 │ 108570 │ 8571 │ 1 │ 15.00 │ … │
│ 5 │ 123927 │ 3928 │ 2 │ 26.00 │ … │
│ 5 │ 37531 │ 35 │ 3 │ 50.00 │ … │
│ 6 │ 139636 │ 2150 │ 1 │ 37.00 │ … │
│ 7 │ 182052 │ 9607 │ 1 │ 12.00 │ … │
│ 7 │ 145243 │ 7758 │ 2 │ 9.00 │ … │
│ … │ … │ … │ … │ … │ … │
└────────────┴───────────┴───────────┴──────────────┴────────────────┴───┘
[ins] In [4]: t.count()
Out[4]: 71988482
[ins] In [5]: r = bot("the tpch query 01 on the lineitems table in the tpch data")
WARNING:root:sql: SELECT l_returnflag, l_linestatus, SUM(l_quantity) AS sum_qty, SUM(l_extendedprice) AS sum_base_price, SUM(l_extendedprice*(1-l_discount)) AS sum_disc_price, SUM(l_extendedprice*(1-l_discount)*(1+l_tax)) AS sum_charge, AVG(l_quantity) AS avg_qty, AVG(l_extendedprice) AS avg_price, AVG(l_discount) AS avg_disc, COUNT(*) AS count_order FROM lineitem WHERE l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus
[ins] In [6]: t = r["tpch.main"]
[ins] In [7]: t
Out[7]:
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━┓
┃ l_returnflag ┃ l_linestatus ┃ sum_qty ┃ sum_base_price ┃ … ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━┩
│ string │ string │ decimal(38, 2) │ decimal(38, 2) │ … │
├──────────────┼──────────────┼────────────────┼──────────────────┼───┤
│ A │ F │ 452986613.00 │ 679238836598.71 │ … │
│ N │ F │ 11834448.00 │ 17742447819.93 │ … │
│ N │ O │ 892076953.00 │ 1337705746297.36 │ … │
│ R │ F │ 453172336.00 │ 679567137737.80 │ … │
└──────────────┴──────────────┴────────────────┴──────────────────┴───┘
[ins] In [1]: bot("the ibis versioning policy")
Out[1]: {'docs': ['data/docs/ibis_docs/concepts/versioning.qmd']}
https://github.com/Aryn-AI/sycamore
llamaindex, other stuff
probably just do something custom for now