jncraton / languagemodels

Explore large language models in 512MB of RAM
https://jncraton.github.io/languagemodels/
MIT License
1.18k stars 78 forks source link

Replacing LaMini for commercial use? #21

Closed ChadDa3mon closed 1 year ago

ChadDa3mon commented 1 year ago

First off, a big thanks for this. Your project has inspired some great ideas and I love how you've simplified this.

I'd like to try and use this in a corporate environment, so the use of LaMini seems to be a deal breaker. l'm still struggling hard to understand all of this, but is there a way I could use something else? It seems like they've taken flan-t5-large and just trained it on a dataset they've created and released under Creative Commons.

So if I'm understanding things, I think I can look for other models that are based on flan-t5-large, or maybe just use flan-t5-large itself? I've tried comparing a similar prompt using your setup vs native flan-t5-large already and the results from your project are much better, so I assume it really is the additional training done by LaMini that makes this shine ?

Either way, thanks for the help and thanks for this project :)

jncraton commented 1 year ago

That's all correct. The LaMini series of models is not licensed for commercial use, but my understanding is that the FLAN-T5 models can be used this way. As you noted, the FLAN-T5 models didn't seem to work quite as well for chat-based user interactions, which is why we currently use the LaMini models.

jncraton commented 1 year ago

I'm working on a possible resolution to this in the commercial branch. The basic idea is to add a require_commercial_license function that requires all model used to be licensed for commercial use. It's described a little in the updated readme:

https://github.com/jncraton/languagemodels/tree/commercial#commercial-use

This currently uses FLAN-T5 instead of LaMini-FLAN-T5, but the underlying models will likely change in future versions as improved models or inference techniques become available.

Would something along those lines be useful to you?

ChadDa3mon commented 1 year ago

Yea, I think it's a great idea. Ultimately I'd love to be able to use this in a project I've got going on where I'm trying to find a decent way for my team (mostly macbook and linux) to run a small model locally. We've got a lot of sensitive data and compliance is a nightmare. Since our laptops are already hardened, being able to run a local LLM option would be awesome.

ChadDa3mon commented 1 year ago

In the mean time I'm trying to figure out how to repeat what LaMini did only with my own dataset :)

jncraton commented 1 year ago

That makes sense. As far as I understand, the fine-tuning isn't the difficult or expensive part. The challenge currently is curating a large open dataset without using proprietary models. The largest currently available datasets that I'm aware of used OpenAI models in their pipeline, so they may have legal issues if used commercially.

jncraton commented 1 year ago

I merged a fix for this. I made it a little more flexible to allow folks to apply a regex against acceptable model licenses. It is described in the updated readme:

https://github.com/jncraton/languagemodels#commercial-use