jbkinney / logomaker

Software for the visualization of sequence-function relationships
MIT License
179 stars 34 forks source link

total height not adding up to 1.0 #40

Open kkj15dk opened 3 weeks ago

kkj15dk commented 3 weeks ago

Hi,

I might be using this package wrong, but I thought the default handling used probabilities directly as the heights in the constructed logoplots. However, when using dataframes that are converted from torch.tensors using softmax (which I treat as probabilities), I get logoplots, where the total heights do not add up to 1, even though the input sums up to 1.0000

minimal example:

import logomaker import matplotlib.pyplot as plt import torch import pandas as pd

rand = torch.randn(20,64) array = torch.nn.functional.softmax(rand, dim=1) array_sums = torch.sum(array, dim=1) print("sums", array_sums)

aminoacids = "ACDEFGHIKLMNPQRSTVWY" amino_acids = list(aminoacids)

df = pd.DataFrame(array.T, columns=amino_acids, dtype=float)

fig, ax = plt.subplots(1,1,figsize=[4,2]) logo = logomaker.Logo(df, ax=ax)

Output:

sums tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])

image

Expected output: Something like figure 1C in (https://doi.org/10.1093/bioinformatics/btz921)

kkj15dk commented 3 weeks ago

Adding: df = logomaker.transform_matrix(df, normalize_values=True)

after creating the dataframe solves this, but I'm not sure why it is needed.