stanford-oval / storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
http://storm.genie.stanford.edu
MIT License
13.36k stars 1.22k forks source link

Alternatives to You.com search API #8

Closed songkq closed 4 months ago

songkq commented 7 months ago

@shaoyijia @Yucheng-Jiang Hi, I'm wondering if there are some other search APIs can be used for storm since that You.com API requires a credit card. https://github.com/stanford-oval/storm/blob/42f4d5bbbaca67bc2e4e8ea5814e0975fef971fc/src/modules/topic_expert.py#L79

For example, can these search APIs provided by langchain be used for alternatives to You.com.

shaoyijia commented 7 months ago

Hi, thanks for your interest!

We view this project as an example of a knowledge curation engine that serves as the intermediate layer between vase unstructured information and human. So, supporting different information sources is in our plan.

For the pointers you provide, are you willing to open a PR for integration? Happy to help merge it.

songkq commented 7 months ago

Yeah, I'll try to integrate more search APIs into storm.

shaoyijia commented 7 months ago

Great, thank you!

LronDC commented 7 months ago

If we have plans and a to-do list, I'd like to claim some tasks to help.

Yucheng-Jiang commented 7 months ago

@LronDC Thank you for your interest in our project! We're currently working on an upcoming code release that will enhance the scalability of the project. We will keep you updated and soon share some potential tasks where the community can contribute. Stay tuned!

songkq commented 7 months ago

@shaoyijia Please review this pull request https://github.com/stanford-oval/storm/pull/20.

  1. Support DuckDuckGoSearchAPI and TavilySearchAPI as Alternatives to You.com.
  2. When enabling TopicExpert to use DuckDuckGoSearchAPI or TavilySearchAPI, these APIs will return compelete contents instead of snippets as default.
  3. One can setup the search API through editing these environments in secrets.toml: Set WEB_SEARCH_API as one of ['DuckDuckGoSearchAPI', 'TavilySearchAPI', 'YouSearchAPI'], using YouSearchAPI as default Setup You.com search API key by YDC_API_KEY= Setup api.tavily.com search API key by TAVILY_API_KEY=
songkq commented 7 months ago

@shaoyijia Considering supporting different information sources, I recommend you to use our open-source project, i.e., QAnything. QAnything is a local knowledge base question-answering system designed to support a wide range of file formats and databases.

dl942702882 commented 7 months ago

sadly, we cannot access to You.com。

Yucheng-Jiang commented 7 months ago

@dl942702882 You.com offers free tier of api quota. It’s sufficient to write more than 25 articles locally.

shaoyijia commented 7 months ago

@dl942702882 , for switching to customized sources (before we support this officially), maybe you can check out what this PR (#20) tries to do?

shaoyijia commented 6 months ago

An update in this thread:

We just release the refactored code to make it easier to run/customize/develop the STORM engine. Now, search API, retrieval model integration in src/rm.py. The knowledge curation engine will directly consume the Information output by Retriever.

shaoyijia commented 6 months ago

@LronDC @songkq , we are now specifically interested in supporting:

  1. Retrieval models that can retrieve information from customized source.
  2. Search API that return academic sources, e.g., Semantic Scholar API

Contribution is highly appreciated if you are interested!

songkq commented 6 months ago

@shaoyijia Hi, I'll support the Semantic Scholar API soon after the API key is obtained.

shaoyijia commented 6 months ago

Hi @songkq , thank you so much! I have a Semantic Scholar API so can also test it.

Yucheng-Jiang commented 4 months ago

@songkq now we support with more retrieval methods, see documentation here: https://github.com/stanford-oval/storm?tab=readme-ov-file#api.

In addition to You.com, we also support Bing search, and customized corpus retrieval with vector database.