uci-cbcl / UFold

MIT License
58 stars 26 forks source link

Batch with different rna length #15

Open giorgiobini opened 1 year ago

giorgiobini commented 1 year ago

Hello,

I am wondering if you have any function to pad batches with different size.

Thank you so much in advance!

sperfu commented 1 year ago

Hi there,

Sorry for that, since our framework could deal with sequence with various length, so to avoid out-of-memory issue, we have limited the batch size and set it to a fixed number. Our training model uses batch size of 1 to deal with all the data. So currently we do not support function to pad batches with different sizes.

Thanks.

sperfu commented 1 year ago

Hi, Regarding to your question on padding batches with different size, I'm afraid we don't have that function. The reason is that different sequence have different length(ranging from 10bp to over a thousand bp). If we pad sequence into the same length, it will inevitably bring useless information, which would deteriorate the performance. So we choose the model batch size 1 with one sequence per input to avoid padding sequence.

Thanks.