run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 732 forks source link

Remove links from telegram text messages #943

Closed diicellman closed 7 months ago

diicellman commented 7 months ago

Description

I encountered an issue while loading data from Telegram posts/messages: if there were many links, it wouldn't pass the OpenAI moderation, resulting in the response: "Sorry, I can't answer, there are too many links." Hence, I utilized regular expressions (re) to remove the links, retaining only the domain (as it could be useful) before converting the text into a Document object.

Fixes # (issue)

Type of Change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Suggested Checklist: