Addition of positional bias term

Description

Here I am trying to brainstorm the positional bias term, which I think could be necessary to incorporate/test at this stage. The positional bias term could be a filter p = Conv2D of (k, 4), where k is the length of the input sequence and 4 is the DNA dictionary. The term can also be batch-specific, meaning that we have to store it in the MultiBind instance and not in the BindingModes instance. This would be then a matrix of (b x p) e.g. For 8 batches the matrix is 8 x k x 4. A problem with the 3d tensor representation is that we have to mask k at some batches whenever there are differences in the length of the sequences.

Alternatively, the positional bias could also be a list of intercepts that are applied per position, sequence-independent.

@johschnee would you say the positional bias is (i) a list of Conv2D for positional biases of (k, 4), or (ii) a single vector of intercepts per position? Probably one is more complex than the other to implement, and we could setup both now.
Did you see any challenges when doing BindingModes as Cond2D or you would reconsider them being put as a 3d tensor and masking positions? http://pbdemo.x3dna.org/files/example_output/multiTF/index.html

Tasks

[ ] incorporate positional_bias flag in MultiBind, and Conv2D (k, 4) referring to learnable positional biases as an additional term, with as many of those as batches. This is similar to the non-specific binding "intercept" we have atm.
[ ] Include this term in the main forward function.
[ ] activate grad during the intercept learning task and deactivate with intercept respectively during learning of other kernels.

Regarding your questions:

I think the positional biases you linked do not depend on the DNA-base, but only on the position. So I'd rather think about a single vector of intercepts per position. But in general I think this depends on the concept of positional bias you want to model.
I think the main question is whether you want to learn all positional biases at once or you also want to learn them iteratively. If you want to learn them all at once, I think putting them as a 3d tensor and masking does the job. If you want to learn them iteratively, you need to be able to turn of the gradient for parts of the parameters. As far as I know, gradient calculation can be only modified for full PyTorch Tensors. Therefore you would then need to store the weights in a list of Conv2D.
Depending on how you want to calculate the positional biases, it maybe could make sense to add another module to the model, which can be used if needed and otherwise the model can be initialized without it. Then there would be a natural place to store the weights.

I hope this answer helps you.

theislab / mubind

Addition of positional bias term #103

Description

Tasks