instadeepai / nucleotide-transformer

🧬 Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
https://www.biorxiv.org/content/10.1101/2023.01.11.523679v2
Other
480 stars 55 forks source link

The shape of embedding issue #80

Open WhiteNan opened 1 month ago

WhiteNan commented 1 month ago

Thank you for publishing NCTransformer. I found that the feature lengths obtained by inputting multiple sequences of the same length into NCTransformer are not the same.I would like to know if you have encountered such a problem in downstream tasks, and how should this situation be handled? Is it filled to the same length? Looking forward to your reply.

WhiteNan commented 1 month ago

I ran the feature extraction code and inputted a sequence of length 4, resulting in [4,2,2560]. I would like to know what 2 represents? And whether the generated features contain markers at the beginning of the sequence.

WhiteNan commented 1 month ago

I ran the feature extraction code and inputted a sequence of length 4, resulting in [4,2,2560]. I would like to know what 2 represents? And whether the generated features contain markers at the beginning of the sequence.

The input format I am using is "ATGC", not ["ATGC"].