sindresorhus / emoj

Find relevant emoji from text on the command-line :open_mouth: :sparkles: :raised_hands: :horse: :boom: :see_no_evil:
MIT License
2.36k stars 57 forks source link

Use machine learning when searching `emojilib` #11

Open Jolg42 opened 8 years ago

Jolg42 commented 8 years ago

Hi!

I think that an offline version would be great! 😃

sindresorhus commented 8 years ago

That would be very cool, but not easy to do, as you would have to gather a lot of data, train a model, for a neural network. Not something I'm going to work on, but happy to receive a pull request if anyone wanna take the challenge.

Jolg42 commented 8 years ago

@sindresorhus You're right! Not super easy…!

sotojuan commented 8 years ago

Perhaps some basic caching for repeated entries? Though it'd get memory-heavy.

matchai commented 8 years ago

The Dango website seems to reference an "Offline Predictions SDK". http://getdango.com/api/#offline-predictions-sdk

Richienb commented 3 years ago

We could alternatively use https://github.com/bfelbo/DeepMoji

Richienb commented 1 year ago

@sindresorhus I'm much more concerned about distribution. That is, the pre-trained model is probably going to be a bit to large for npm (many megabytes at least, probably). Perhaps we could upload them to GitHub releases?

sindresorhus commented 1 year ago

I think a better solution is to let users opt into using the ChatGPT API by providing their own OpenAI API key. It will return much better results than whatever we can do here.

Example prompt:

Give me the 10 most relevant emojis for this text as a newline list: hungry

Richienb commented 1 year ago

Oh I see. Tragically though, the ChatGPT API has no free plan. Is this a problem? Locally running a large language model remains on the table, but not if for space concerns. Also am thinking about: using predictions from the large language model as the dataset for training a smaller one.

sindresorhus commented 1 year ago

Tragically though, the ChatGPT API has no free plan. Is this a problem?

No. It would be opt-in anyway. The paid plan is not expensive. It would be almost nothing to simply fetch emojis.

Locally running a large language model remains on the table, but not if for space concerns. Also am thinking about: using predictions from the large language model as the dataset for training a smaller one.

I don't see it being worth the effort.

Richienb commented 1 year ago

note to self: https://github.com/microsoft/guidance https://github.com/microsoft/guidance/blob/main/notebooks/chat.ipynb

Richienb commented 2 weeks ago

I don't see it being worth the effort.

Needing to use an API might have been a deal-breaker for me, but now I believe the technology has advanced to the point where locally running a model will soon be worth the effort.

Take a look at: https://xenova.github.io/transformers.js/

At first glance, it seems that emojis might not work, but there likely will be something similar that does.

Exact same use case:

https://github.com/explainers-by-googlers/prompt-api#n-shot-prompting