karpathy / makemore

An autoregressive character-level language model for making more things
MIT License
2.59k stars 681 forks source link

Can these models also be used for classification? #11

Open hoosierEE opened 1 year ago

hoosierEE commented 1 year ago

If we had labels for these names, such as:

| name   | is_palindrome | h_index | scrabble_score |
|--------+---------------+---------+----------------|
| anna   |             1 |       4 |              4 |
| jake   |             0 |       1 |             15 |
| bob    |             1 |       7 |              7 |
| karen  |             0 |       8 |              8 |
| andrej |             0 |      11 |             14 |
| ...    |               |         |                |

Can makemore-style generative models be modified to perform classification so I can feed in a new name like asdf and get a prediction for its h_index?

While a suggestion like "add this layer here" would absolutely be helpful, I'm secretly hoping someone will share a general, intuitive way to think about repurposing machine learning models for new tasks...

hoosierEE commented 1 year ago

Normally our training examples are tokenized like this:

But I was thinking you could append special "label" tokens:

Maybe this is a silly idea, but I'm going to give it a try and see if it works. At least it won't require changing the model architecture very much.

Kotrotsos commented 11 months ago

Normally our training examples are tokenized like this:

  • <S> b o b <E>
  • <S> j a k e <E>

But I was thinking you could append special "label" tokens:

  • <S> b o b <E> <is_palindrome=1>
  • <S> j a k e <E> <is_palindrome=0>

Maybe this is a silly idea, but I'm going to give it a try and see if it works. At least it won't require changing the model architecture very much.

Did you have any luck with this?

hoosierEE commented 11 months ago

Haven't tried it yet but this is a good reminder that I should.