Closed BioLaoXu closed 3 years ago
Hi, the language models we ship have a fixed dimensionality that is fixed through their training. If you want lower dimensional embeddings, you need some separate dimensionality reduction method such as PCA or an autoencoder
Hi @xf78 , thanks for using bio-embeddings.
As @konstin pointed out, the embedding dimention is set during training of the models (whichever one you pick), so there's no immediate way of changing that dimension out of the box, without retraining the models.
What @konstin proposes is a good approach if you really need smaller embeddings. Be careful about data-driven vs. "self supervised" approaches, e.g. if you use an autoencoder (VAE) with a smaller latent representation and compute loss on reconstruction, then that's ok. It's less ok if you use e.g. t-SNE, as the lower dimentional embeddings of any sequence then depends on neighbours and background.
Finally: do you really need smaller dimensional representations? If so: why?
If your goal is to train a machine learning model on a particular task, I would much rather suggest you to take the full 1024 dimensional embedding and feed it to a fully connected layer that projects the embedding down to a smaller dimension, e.g. 32. Examples of that can be found here and here. This way the dimensionality reduction is learned together with your supervised task: a much better approach in low-data scenarios!
i have tested two embedders :ProtTransBertBFDEmbedder and SeqVecEmbedder,pretty nice,they quickly embedding my amino acid sequences;the length of my amino acid sequence is between 10AA and 20AA,I need to embedding each of them to a fixed dimension,for example:30 dimension,however the default out embedding dimension of ProtTransBertBFDEmbedder and SeqVecEmbedder is 1024, i have created an inheritance bio_embeddings. Embed. Prottrans_base_embedder. ProtTransBertBaseEmbedder class,and change the embedding_dimension to 30, but the final final out embedding dimension is always 1024,so how do I change final out embedding dimension,Looking forward to your reply.thanks.