openai / sparse_autoencoder

MIT License
259 stars 26 forks source link

Understanding table output #5

Closed lastonehome closed 1 month ago

lastonehome commented 2 months ago

Hi,

thanks for publishing this. Can I check I understand the output in the tables, please? The docid I’m assuming relates to a specific output from the model from a given prompt? And the token id is related to the prompt itself?

image

Apologies if these are simple questions. Interested in explainable AI more than the technical ins and outs.

WuTheFWasThat commented 1 month ago

docid is just an ID of the document from a dataset. It is not a model output, most likely human-written from the internet

the token ID is just the index of the activated token. for example if token was 0, it would mean that the first token had a positive activation