jmschrei / bpnet-lite

This repository hosts a minimal version of a Python API for BPNet.
MIT License
32 stars 14 forks source link

bpnetlite.attributions.dinucleotide_shuffle errors out when there is less than 4 nucleotides in the sequence #3

Closed gokceneraslan closed 1 year ago

gokceneraslan commented 1 year ago
from bpnetlite.attributions import dinucleotide_shuffle
from bpnetlite.io import one_hot_encode
import torch

dinucleotide_shuffle(torch.LongTensor(one_hot_encode('AGGTAGGT')))

gives

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[360], line 5
      2 from bpnetlite.io import one_hot_encode
      3 import torch
----> 5 dinucleotide_shuffle(torch.LongTensor(one_hot_encode('AGGTAGGT')))

File ~/.miniconda3/lib/python3.10/site-packages/bpnetlite/attributions.py:191, in dinucleotide_shuffle(sequence, n_shuffles, random_state)
    188     next_idxs_ = numpy.where(idxs[:-1] == char)[0]
    189     n = len(next_idxs_)
--> 191     next_idxs[char][:n] = next_idxs_ + 1
    192     next_idxs_counts[char] = n
    194 shuffled_sequences = numpy.zeros((n_shuffles, *sequence.shape), dtype=numpy.float32)

IndexError: index 3 is out of bounds for axis 0 with size 3
jmschrei commented 1 year ago

image

jmschrei commented 1 year ago

the three there being "A", "G", and "T".

gokceneraslan commented 1 year ago

Arry Potter and the broken dinucshuf()

jmschrei commented 1 year ago

Looks like there were sort of two issues. The first issue is that my one_hot_encode command was returning a transposed version of what it should be. That has been fixed throughout the repo now. The second is that the dinucleotide shuffled sequence was making the shape of the returned sequence be based on the number of unique characters rather than just using the shape. So, that's been fixed too. v0.5.6 is up and should contain these fixes. Let me know if that works?

gokceneraslan commented 1 year ago

Works well, thanks!