ibis-project / ibis-birdbrain

portable Python ML-powered data bot
https://ibis-project.github.io/ibis-birdbrain/
Apache License 2.0
23 stars 4 forks source link

feat: [Ibis] documentation search #19

Closed lostmygithubaccount closed 7 months ago

lostmygithubaccount commented 12 months ago

https://github.com/Aryn-AI/sycamore

llamaindex, other stuff

probably just do something custom for now

lostmygithubaccount commented 12 months ago

pre-filter, post-filter?

lostmygithubaccount commented 12 months ago
[ins] In [1]: bot.query("what's the ibis how-to guide for working with altiar?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1802 request_id=3334f23cf945cde67b160318a1245525 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1457 request_id=10f99cefccf8191d2802bb6d79e8c5fe response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=2050 request_id=7eeb7a437caa44e16a95170b0fe08ae8 response_code=200
Out[1]: {'docs': ['data/docs/ibis_docs/how-to/visualization/altair.qmd']}

[ins] In [2]: bot.query("tutorial docs on ibis")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1630 request_id=dd01fdec420b29428d554c54ce3035da response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=884 request_id=74fa746be871e757d51c7be3ca529481 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=6589 request_id=c4c05799f07f131ac702645bcbfee75e response_code=200
Out[2]:
{'docs': ['data/docs/ibis_docs/tutorials/ibis-for-dplyr-users.qmd',
  'data/docs/ibis_docs/tutorials/getting_started.qmd',
  'data/docs/ibis_docs/tutorials/ibis-for-pandas-users.qmd',
  'data/docs/ibis_docs/tutorials/ibis-for-sql-users.qmd',
  'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/1_basics.qmd',
  'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/0_setup.qmd']}
lostmygithubaccount commented 12 months ago

same w/ GPT-3.5:

[ins] In [1]: bot.query("what's the ibis how-to guide for working with altiar?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1769 request_id=c7b56b76b46c0097474007ac9bf77d1c response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=287 request_id=65cc3ad7f1d2105374e26f939eca0949 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=867 request_id=7b6bf533e21dbedf97a7445ac29d63ce response_code=200
Out[1]: {'docs': ['data/docs/ibis_docs/how-to/visualization/altair.qmd']}

[ins] In [2]: bot.query("tutorial docs on ibis")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=800 request_id=2404a1ca22b1a4fd952af4d847c6c8ba response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=388 request_id=b09c00c70d58f1440fd6ce55ad3f3084 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=2273 request_id=bd527044a7853ed50a67c0ce61388c89 response_code=200
Out[2]:
{'docs': ['data/docs/ibis_docs/tutorials/ibis-for-dplyr-users.qmd',
  'data/docs/ibis_docs/tutorials/getting_started.qmd',
  'data/docs/ibis_docs/tutorials/ibis-for-pandas-users.qmd',
  'data/docs/ibis_docs/tutorials/ibis-for-sql-users.qmd',
  'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/1_basics.qmd',
  'data/docs/ibis_docs/tutorials/data-platforms/starburst-galaxy/0_setup.qmd']}
lostmygithubaccount commented 12 months ago
[ins] In [1]: bot.query("what's the ibis_tpc doc on query 1?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1633 request_id=50bf4f4952f88b521274953e096027b6 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1049 request_id=a2de1894e5ad591d42560d9cb3170dd4 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1232 request_id=b9abf4430eae5a9108f7b982e8c6d80c response_code=200
Out[1]: {'docs': ['data/docs/ibis_tpc/h01.py']}

[ins] In [2]: bot.query("what's the birdbrain doc that explains why it exists?")
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1490 request_id=97f23ceaad77ef19f239cf2414e1daff response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1299 request_id=9227d4af01eb8e129d9dae53ac27d0c9 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1132 request_id=109fdd6ed4e0d288f6a6adb9d0e3e948 response_code=200
Out[2]: {'docs': ['data/docs/birdbrain_docs/why.qmd']}
lostmygithubaccount commented 12 months ago
[ins] In [1]: bot.query("the first few rows in the lineitems table of the tpch data")
         ...:
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1375 request_id=1b05c603732a2f9fb1c3e9af2ffd26db response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=218 request_id=e8ad11db91a557eaf3fb538c82490282 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=450 request_id=0c70ffeb67fdc144ae33250ce87c6ca2 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1261 request_id=303a88ea472087b98d46db018f63a158 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=621 request_id=d7a9ec403b03272c4f581fdbe4ea1356 response_code=200
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1398 request_id=45d78af4a86c2c6b41c8ba80d9a51e7e response_code=200
INFO:root:tables: {'lineitem': ibis.Schema {
  l_orderkey       !int32
  l_partkey        !int32
  l_suppkey        !int32
  l_linenumber     !int32
  l_quantity       !decimal(15, 2)
  l_extendedprice  !decimal(15, 2)
  l_discount       !decimal(15, 2)
  l_tax            !decimal(15, 2)
  l_returnflag     !string
  l_linestatus     !string
  l_shipdate       !date
  l_commitdate     !date
  l_receiptdate    !date
  l_shipinstruct   !string
  l_shipmode       !string
  l_comment        !string
}}
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1890 request_id=4f8513f21a0331755371d0be10e60053 response_code=200
INFO:root:sql: SELECT * FROM lineitem LIMIT 10
Out[1]:
{'tpch.main': ┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━┓
 ┃ l_orderkey ┃ l_partkey ┃ l_suppkey ┃ l_linenumber ┃ l_quantity     ┃ … ┃
 ┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━┩
 │ int32      │ int32     │ int32     │ int32        │ decimal(15, 2) │ … │
 ├────────────┼───────────┼───────────┼──────────────┼────────────────┼───┤
 │          1 │    155190 │      7706 │            1 │          17.00 │ … │
 │          1 │     67310 │      7311 │            2 │          36.00 │ … │
 │          1 │     63700 │      3701 │            3 │           8.00 │ … │
 │          1 │      2132 │      4633 │            4 │          28.00 │ … │
 │          1 │     24027 │      1534 │            5 │          24.00 │ … │
 │          1 │     15635 │       638 │            6 │          32.00 │ … │
 │          2 │    106170 │      1191 │            1 │          38.00 │ … │
 │          3 │      4297 │      1798 │            1 │          45.00 │ … │
 │          3 │     19036 │      6540 │            2 │          49.00 │ … │
 │          3 │    128449 │      3474 │            3 │          27.00 │ … │
 └────────────┴───────────┴───────────┴──────────────┴────────────────┴───┘}
lostmygithubaccount commented 12 months ago

[ins] In [1]: r = bot("all the rows in the lineitems table in the tpch data")
WARNING:root:sql: SELECT * FROM lineitem

[ins] In [2]: t = r["tpch.main"]

[ins] In [3]: t
Out[3]:
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━┓
┃ l_orderkey ┃ l_partkey ┃ l_suppkey ┃ l_linenumber ┃ l_quantity     ┃ … ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━┩
│ int32      │ int32     │ int32     │ int32        │ decimal(15, 2) │ … │
├────────────┼───────────┼───────────┼──────────────┼────────────────┼───┤
│          1 │    155190 │      7706 │            1 │          17.00 │ … │
│          1 │     67310 │      7311 │            2 │          36.00 │ … │
│          1 │     63700 │      3701 │            3 │           8.00 │ … │
│          1 │      2132 │      4633 │            4 │          28.00 │ … │
│          1 │     24027 │      1534 │            5 │          24.00 │ … │
│          1 │     15635 │       638 │            6 │          32.00 │ … │
│          2 │    106170 │      1191 │            1 │          38.00 │ … │
│          3 │      4297 │      1798 │            1 │          45.00 │ … │
│          3 │     19036 │      6540 │            2 │          49.00 │ … │
│          3 │    128449 │      3474 │            3 │          27.00 │ … │
│          3 │     29380 │      1883 │            4 │           2.00 │ … │
│          3 │    183095 │       650 │            5 │          28.00 │ … │
│          3 │     62143 │      9662 │            6 │          26.00 │ … │
│          4 │     88035 │      5560 │            1 │          30.00 │ … │
│          5 │    108570 │      8571 │            1 │          15.00 │ … │
│          5 │    123927 │      3928 │            2 │          26.00 │ … │
│          5 │     37531 │        35 │            3 │          50.00 │ … │
│          6 │    139636 │      2150 │            1 │          37.00 │ … │
│          7 │    182052 │      9607 │            1 │          12.00 │ … │
│          7 │    145243 │      7758 │            2 │           9.00 │ … │
│          … │         … │         … │            … │              … │ … │
└────────────┴───────────┴───────────┴──────────────┴────────────────┴───┘

[ins] In [4]: t.count()
Out[4]: 71988482

[ins] In [5]: r = bot("the tpch query 01 on the lineitems table in the tpch data")
WARNING:root:sql: SELECT l_returnflag, l_linestatus, SUM(l_quantity) AS sum_qty, SUM(l_extendedprice) AS sum_base_price, SUM(l_extendedprice*(1-l_discount)) AS sum_disc_price, SUM(l_extendedprice*(1-l_discount)*(1+l_tax)) AS sum_charge, AVG(l_quantity) AS avg_qty, AVG(l_extendedprice) AS avg_price, AVG(l_discount) AS avg_disc, COUNT(*) AS count_order FROM lineitem WHERE l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus

[ins] In [6]: t = r["tpch.main"]

[ins] In [7]: t
Out[7]:
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━┓
┃ l_returnflag ┃ l_linestatus ┃ sum_qty        ┃ sum_base_price   ┃ … ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━┩
│ string       │ string       │ decimal(38, 2) │ decimal(38, 2)   │ … │
├──────────────┼──────────────┼────────────────┼──────────────────┼───┤
│ A            │ F            │   452986613.00 │  679238836598.71 │ … │
│ N            │ F            │    11834448.00 │   17742447819.93 │ … │
│ N            │ O            │   892076953.00 │ 1337705746297.36 │ … │
│ R            │ F            │   453172336.00 │  679567137737.80 │ … │
└──────────────┴──────────────┴────────────────┴──────────────────┴───┘
lostmygithubaccount commented 12 months ago
[ins] In [1]: bot("the ibis versioning policy")
Out[1]: {'docs': ['data/docs/ibis_docs/concepts/versioning.qmd']}