timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
4.92k stars 622 forks source link

Embeddings object creates embedding dimensions incorrectly #846

Open bankeiyotaku opened 9 months ago

bankeiyotaku commented 9 months ago

This line of code in the init function resets the embedding dimensions so that there is no dimensionality reduction by setting embedding_dims to the dictionary size. If you have 1000 categories -> it will create 1000 embeddings and override any dimensions specified in embedding_dims.

    embedding_dims = [emb_sz_rule(s) if s is None else s for s in n_embeddings]

Code context below:

class Embeddings(nn.Module): "Embedding layers for each categorical variable in a 2D or 3D tensor" def init(self, n_embeddings:list, # List of num_embeddings for each categorical variable embedding_dims:list=None, # List of embedding dimensions for each categorical variable padding_idx:int=0, # Embedding padding_idx embed_dropout:float=0., # Dropout probability for Embedding layer kwargs ): super().init() if not isinstance(n_embeddings, list): n_embeddings = [n_embeddings] if embedding_dims is None: embedding_dims = [emb_sz_rule(s) for s in n_embeddings] if not isinstance(embedding_dims, list): embedding_dims = [embedding_dims] _embedding_dims = [emb_sz_rule(s) if s is None else s for s in n_embeddings]_ assert len(n_embeddings) == len(embedding_dims) self.embedding_dims = sum(embedding_dims) self.embedding_layers = nn.ModuleList([nn.Sequential(nn.Embedding(n,d,padding_idx=padding_idx, kwargs), nn.Dropout(embed_dropout)) for n,d in zip(n_embeddings, embedding_dims)])