Hello there,
I attempted to extract representations from the nucleotide transformer, particularly utilizing the 250 million multi-species model. Is there a suggested method for retrieving representations from embeddings, or would it be more effective to use the CLS token as a representation for my sequences?
To provide more context, these representations I'm seeking to extract are intended as initial embeddings for downstream tasks. The sequence lengths I'm working with vary significantly, ranging from 10 base pairs to several thousand base pairs.
Hello there, I attempted to extract representations from the nucleotide transformer, particularly utilizing the 250 million multi-species model. Is there a suggested method for retrieving representations from embeddings, or would it be more effective to use the CLS token as a representation for my sequences? To provide more context, these representations I'm seeking to extract are intended as initial embeddings for downstream tasks. The sequence lengths I'm working with vary significantly, ranging from 10 base pairs to several thousand base pairs.
Thanks in advance!