Sampling conditional token distribution

salesforce / progen

Official release of the ProGen models

BSD 3-Clause "New" or "Revised" License

604 stars 111 forks source link

It would be super valuable to have an example script to sample conditional token probabilities for a target index given sequence context.

There seem to be some technical details that are important, but not easy to figure out:

eg not all tokens being actually used
or wether or not this LM always has to work in a causal left-to-right manner, or it can also be used to do "inpainting" of residues in the middle of a sequence...

Finally, the way I'm currently evaluating mutations is by sequentially computing sequence likelihoods for each possible mutated sequence, so this takes 20 forward passes per single point mutation. But I think this is vastly inefficient, since the model produces logits for every position, can the logits for the target index simply be used as a proxy for token probability?

salesforce / progen

Sampling conditional token distribution #7