lucidrains / h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
MIT License
154 stars 21 forks source link

Use for relatively short sequences and small datasets #23

Open Vedasheersh opened 1 year ago

Vedasheersh commented 1 year ago

Hi,

Firstly, I am not quite sure if this is a good place for asking. Apologize if it is not. Regardless, I love all your implementations and their ease of use!!😃

Question: Would you think this model would work for relatively small sequences (proteins with 20 amino acids as tokens) around 1000 characters in length?

Also, the datasets I have are relatively small - around 20,000 datapoints with float labels. So basically, I am trying to use this model as a summarizer for sequences accounting for long range dependencies to generate a floating point number as output.

Because of the small dataset, I plan to use small dimensions and layer depths to make up-to total of say ~50k parameters or so.

Would love to hear your thoughts!

Many many thanks!

Veda.