Store PWM as numpy array

vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.

https://gimmemotifs.readthedocs.io/en/master

MIT License

110 stars 33 forks source link

Store PWM as numpy array #210

Open laserson opened 3 years ago

laserson commented 3 years ago

Is your feature request related to a problem? Please describe. I'm guessing most people would rather work with a PWM as a numpy array. Numpy is already a dependency for gimmemotifs, so it wouldn't be a burden to use it. I use the motifs in gimmemotifs to do lots of sequence sampling, so in practice I convert the PWMs to numpy arrays. This is annoying, however, because I need to keep a separate numpy array copy.

Describe the solution you'd like Provide a numpy array version of the PWM.

laserson commented 3 years ago

As a workaround, when I load motifs, I run them through this function:

def add_normalized_array_pwm(motif: Motif) -> Motif:
    """Adds numpy array version of pwm as `Motif.array_pwm`

    Note that this mutates the given `Motif` object.
    """
    pwm = np.asarray(motif.pwm)
    # apparently some of the positions don't sum to 1
    pwm /= pwm.sum(axis=1).reshape(-1, 1)
    motif.array_pwm = pwm
    return motif

simonvh commented 3 years ago

Yes, this could easily be done. What you'll notice is that the Motif() class is one of the oldest parts of GimmeMotifs. It does show its age (and my inexperience at the time ;)).