Closed markNZed closed 1 year ago
great idea 💯
here's some related notes I had on this.
COSS / hosted proxy for interacting with the OpenAI API and/or other main AI APIs
- problems it tries to solve
- caching (seriously POST caching for embeddings & completions alone is huge)
- observability
- monitoring
- privacy (e.g., see https://github.com/cado-security/masked-ai)
- general engineering productionization concerns like latency that you guys have already graphed
- the ability to have more than 5 keys and to put limits on each individual key
- OSS examples
- https://github.com/egoist/openai-proxy
- https://github.com/6/openai-caching-proxy-worker
- https://github.com/cado-security/masked-ai
- https://github.com/easychen/openai-api-proxy
- companies
- [Helicone (YC W23)](https://www.helicone.ai/)
- notes
- pro: could quickly build an MVP of this using CF workers and something like [Reflare](https://github.com/xiaoyang-sde/reflare)
- con: imho this is def a feature, not a platform, and openai will build better first-party support for these use cases over time
- same idea but substitute Cohere/Anthropic/etc or building a normalization layer across all of these
- if you could manage to get traction w/ this normalization layer, you could eventually replace the third-party APIs with your own first-party APIs over time, but disintermediation would be really tough w/ this sort of thing
- related to this idea, there have been a lot of viral Bring-Your-Own-Key demos built on top of OpenAI. I’d love to build a lightweight solution which solves for this use case in a more trustworthy, secure manner, since pasting your OpenAI secret key into random webapps is a security nightmare
I'm going to close this issue as out of scope for this repo, but I hope my notes above are useful for anyone looking to add this into their workflow – or anyone who wants to build this type of caching abstraction 🔥
thanks @markNZed 🙏
Describe the feature
During development it could be useful to cache OpenAI API responses while keeping behaviors like the incremental returning of results. This might be a proxy of the OpenAI API and maybe another project, like https://github.com/easychen/openai-api-proxy
This can be done at the level of the application using chatgpt-api but things like streaming add some complications.