michaelthwan / searchGPT

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
MIT License
600 stars 62 forks source link

De-FAISS openai and use native OpenAI embedding #59

Closed michaelthwan closed 1 year ago

michaelthwan commented 1 year ago

pro1: We don't use a lot of faiss api pro2: Stupid duplicated call pro3: enable distance -> can tune the footnote

Difficulty: how to effectively batch call

michaelthwan commented 1 year ago

test searching using 10 texts with a search_text. Sequential call mode : 2.879sec Batch call mode : 0.609sec Result exact match

search_text = 'delicious beans'
    texts = [
        "Discover the world of delicious beans with our premium selection.",
        "Try our savory bean soup recipe for a delicious and nutritious meal.",
        "Our roasted coffee beans are carefully selected for their rich and delicious flavor.",
        "Beans are not only delicious, but also a great source of protein and dietary fiber.",
        "Looking for a delicious vegan meal? Try our spicy black bean burger recipe.",

        "The sky is blue and the sun is shining today.",
        "I need to go grocery shopping after work to pick up some milk and bread.",
        "Did you hear about the new movie that just came out? It's supposed to be really good.",
        "I'm planning a trip to Europe next summer and I'm so excited.",
        "My cat keeps meowing at me for no reason and it's driving me crazy.",
    ]
C:\Users\MW\Anaconda3\envs\searchgpt\python.exe C:/github/!searchGPT/searchGPT/playground/test_OpenAI_Embedding.py
(10, 2)
Sequential call mode:
compute_embeddings() text: Discover the world of delicious beans with our premium selection.
compute_embeddings() text: Try our savory bean soup recipe for a delicious and nutritious meal.
compute_embeddings() text: Our roasted coffee beans are carefully selected for their rich and delicious flavor.
compute_embeddings() text: Beans are not only delicious, but also a great source of protein and dietary fiber.
compute_embeddings() text: Looking for a delicious vegan meal? Try our spicy black bean burger recipe.
compute_embeddings() text: The sky is blue and the sun is shining today.
compute_embeddings() text: I need to go grocery shopping after work to pick up some milk and bread.
compute_embeddings() text: Did you hear about the new movie that just came out? It's supposed to be really good.
compute_embeddings() text: I'm planning a trip to Europe next summer and I'm so excited.
compute_embeddings() text: My cat keeps meowing at me for no reason and it's driving me crazy.
search_similar() text: delicious beans
compute_embeddings() text: delicious beans
                                                text  ...  similarities
0  Discover the world of delicious beans with our...  ...       0.89247
1  Try our savory bean soup recipe for a deliciou...  ...       0.88657
3  Beans are not only delicious, but also a great...  ...       0.87353

[3 rows x 4 columns]

  _     ._   __/__   _ _  _  _ _/_   Recorded: 20:39:21  Samples:  60
 /_//_/// /_\ / //_// / //_'/ //     Duration: 2.880     CPU time: 0.062
/   _/                      v4.4.0

Program: C:/github/!searchGPT/searchGPT/playground/test_OpenAI_Embedding.py

2.879 <module>  test_OpenAI_Embedding.py:1
├─ 2.660 Series.apply  pandas\core\series.py:4661
│     [3 frames hidden]  pandas
│        2.660 SeriesApply.apply_standard  pandas\core\apply.py:1159
│        └─ 2.660 <lambda>  test_OpenAI_Embedding.py:89
│           └─ 2.660 compute_embeddings  test_OpenAI_Embedding.py:29
│              └─ 2.660 Embedding.create  openai\api_resources\embedding.py:14
│                    [130 frames hidden]  openai, requests, urllib3, http, sock...
└─ 0.210 search_similar  test_OpenAI_Embedding.py:35
   └─ 0.206 compute_embeddings  test_OpenAI_Embedding.py:29
      └─ 0.206 Embedding.create  openai\api_resources\embedding.py:14
            [25 frames hidden]  openai, requests, urllib3, http, sock...

Batch call mode:
compute_embeddings_2() len(texts): 10
search_similar() text: delicious beans
compute_embeddings() text: delicious beans
                                                text  ...  similarities
0  Discover the world of delicious beans with our...  ...      0.892470
1  Try our savory bean soup recipe for a deliciou...  ...      0.886590
3  Beans are not only delicious, but also a great...  ...      0.873494

[3 rows x 4 columns]

  _     ._   __/__   _ _  _  _ _/_   Recorded: 20:39:24  Samples:  21
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.609     CPU time: 0.000
/   _/                      v4.4.0

Program: C:/github/!searchGPT/searchGPT/playground/test_OpenAI_Embedding.py

0.609 <module>  test_OpenAI_Embedding.py:1
├─ 0.400 compute_embeddings_2  test_OpenAI_Embedding.py:43
│  └─ 0.400 Embedding.create  openai\api_resources\embedding.py:14
│        [41 frames hidden]  openai, requests, urllib3, http, sock...
├─ 0.199 search_similar  test_OpenAI_Embedding.py:35
│  └─ 0.196 compute_embeddings  test_OpenAI_Embedding.py:29
│     └─ 0.196 Embedding.create  openai\api_resources\embedding.py:14
│           [26 frames hidden]  openai, requests, urllib3, http, sock...
└─ 0.008 DataFrame.__repr__  pandas\core\frame.py:1054
      [70 frames hidden]  pandas

Process finished with exit code 0