michaelthwan / searchGPT

Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
MIT License
621 stars 65 forks source link

Improve Bing result extraction #14

Closed michaelthwan closed 1 year ago

michaelthwan commented 1 year ago

Currently using p extraction which have many useless text (sentences) in the text_df. Need to trim only useful one. BTW if it is still too long, the get_prompt will trim it using config's prompt_length_limit

Checklist

michaelthwan commented 1 year ago

@peterhcyuen used trafilatura to extract. Merged.

michaelthwan commented 1 year ago

Related tuning #35

michaelthwan commented 1 year ago

Possibly close. Not bad now @peterhcyuen