tarasglek / chatcraft.org

Developer-oriented ChatGPT clone
https://chatcraft.org/
MIT License
155 stars 35 forks source link

Integrate scrape2md into CloudFlare proxy function #377

Open humphd opened 9 months ago

humphd commented 9 months ago

From Discord:

taras: Team, i wrote a little lib to help scrape stuff better: https://github.com/tarasglek/scrape2md would appreciate if someone would be my coauthor, first order of business would be to package it as a lib and figure out how to deploy it on cloudflare instead of our current scraper.

Let's get this into a usable state, which doing a few things across various PRs:

  1. Package and ship to the npm registry. The work @WangGithub0 did on the typescript2openai repo serves as a useful starting point: https://github.com/tarasglek/typescript2openai/pull/1
  2. Finish and merge #370, which alters the flow of our proxy.ts to be more flexible for plugging in various transformers. This library is essentially a transformer for a few content types, and would be pretty easy to wrap with the new API.
  3. Integrate this scraper via npm into the new proxy transformer structure. I suspect we'll have some fixes to do to make this work in the CloudFlare context.
Rachit1313 commented 8 months ago

@humphd , Could you please review https://github.com/tarasglek/scrape2md/pull/1 ?