mishushakov / llm-scraper

Turn any webpage into structured data using LLMs
MIT License
2.42k stars 147 forks source link

break page into chunks html mode #14

Open EcomGraduates opened 7 months ago

EcomGraduates commented 7 months ago

long pages tend to cause a token error, It would be useful if it could calculate the tokens of a page and break it up into chunks or maybe strip some of the html out that we don't technically need? script tags ect

mishushakov commented 7 months ago

Really good idea 👍

mishushakov commented 7 months ago

Feel free to open a pull request, if you have even a rough idea how the code for this could look like