Add mean pooling experiment to classifier bonus experiments

Setting --average_embeddings will average the embeddings over all tokens. If this option is not used (the default), only the output embeddings at the chosen token position (specified by --trainable_token_pos) are considered; for example, the embeddings of the last token. Enabling --average_embeddings will mean-pool the embeddings of all tokens into the position chosen by --trainable_token_pos (the last token by default). As we can see, this improves the performance from 95.00% to 96.33% with only a minimal increase in run time (0.28 min to 0.32 min) and might be worthwhile considering in practice.

rasbt / LLMs-from-scratch

Add mean pooling experiment to classifier bonus experiments #406