mustafa-tariqk commented 6 months ago

As it stands currently, the chatbot is very simple. Goal is to add:

Memory via embeddings store
Lower processing requirements through quantization
"Higher value" generations by tweaking system prompt
Experimentation of other language models, both local and cloud based (OpenAI, Anthropic, Cohere, etc.)

This ticket should get us 90% there in terms of an effective language model for the task.

BasicallyOk commented 6 months ago

Moving to Quality of Life milestone. I don't think we need this for MVP.

mustafa-tariqk commented 6 months ago

talk to me before moving stuff around I had this set and completed for this sprint QoL stuff will be expanding upon CI/CD and integration of frontend backend

mustafa-tariqk commented 6 months ago

Rant

Gonna rant a bit about an architectural decision to go with OpenAI instead of my own models.

When this project started I made a decision on how we'd go about picking technologies. 1. Use what we know, 2. Use what is popular, 3. Use what is simple. Phi 2 for this project violated all 3.

Why I went in this direction

The initial idea was that we were constrained by resources. I didn't want to hand Neuma something that'll take like $50 a month to get going (for the worst version no less)

Why it was a bad idea

By going with an open source model, I had neglected all three. My work in the past has been running 7B - 34B param models on local hardware, so I have a good feel for their performance. The choice to go with a small language model was resource constraints. Any bigger and we'd require GPU inferencing.

Due to phi2s small nature it also hasn't been as researched and played with compared to larger sizes/closed source models. I mean almost everyone has played with the OpenAI API at this point, but not nearly as many download phi and get it running on their system.

Running local models is not simple. To get up and running it is, but to get up and running well it's difficult. You have to deal with quantizations to make the model smaller, GPU inferencing to make it worthwhile on speed, there is a massive rabbit hole to go down. All of this vs an API you can call yourself, the decision is pretty simple.

What's nice about using OpenAI

We have a much greater quality of generations. Pricing should be cheaper (usage based) vs always having a local model running. Hassle is gone, and it more or less ticks all the boxes of the 3 points I mention in the intro.

Also, langchain is pretty open in terms of interoperability with other language models, I think we can get away with changing one variable to change providers.

Some risks associated with this

Another point of payment, Neuma now has to deal with DigitalOcean + OpenAI. Keeping it all in one place would have been nice. The thought had crossed my mind to drop DigitalOcean and use Google Sheets as a backend. But I think that would be too cursed.

How this is addressed: Can't do much really, sucks but it is what it is, benefits outweigh the negatives greatly from above.

Risk of spam becoming a big problem, if security isn't handled properly, someone can setup an attack that queries the OpenAI API a bunch of times.

How this is addressed: setting a limit on spend on the OpenAI dashboard. Finding ways we can spam the system and setting up deterrents like limiting how many requests can be made a day.

Hope this describes my decisions well

mustafa-tariqk / mindscape

Chatbot Revision #23

Rant

Why I went in this direction

Why it was a bad idea

What's nice about using OpenAI

Some risks associated with this