jbkinney / logomaker

Software for the visualization of sequence-function relationships
MIT License
179 stars 34 forks source link

Inquiries about the log axis #33

Closed liyunjian closed 1 year ago

liyunjian commented 1 year ago

Hello, I am making a logo map containing 200,000 sequences, so there are some amino acids with a frequency as low as 0.00005 (the highest is 100%, represented by 1), and such low-frequency amino acids cannot be seen in the picture , Therefore, I plan to use 10e-6, 10e-5, 10e-4...10e0, which has a uniform scale length on the integer scale of 10. I tried to use "plt.yscale('log')" but it seems that this function cannot be achieved, and because the frequency is between 0-1, I can't log the data first and then input the data to draw the graph.

So I would like to ask you if there is a way, or if there is hope to add this function to this python package. Looking forward to your reply and wish you all the best.

atareen commented 1 year ago

Hello, I think understand what you're saying, but it would be helpful for me if you could provide me a minimal example to reproduce your type of logo, and I can try to help with the visualization of the low-frequency amino-acids from there. Thanks.

Ammar

atareen commented 1 year ago

Hello @liyunjian , pinging you for a minimal example. Thanks.

liyunjian commented 1 year ago

Hello, atareen, I am sorry that I have been busy these days and forgot to reply you. I created a fasta file containing 10,000 amino acid sequences, each of them is "MFVFLVLLPM", which will be used as a reference sequence. Then I changed the first "M" of the first sequence to "L", the second "F" of the first 100 sequences was changed to "M", and the last "M" of all sequences Change to "L". In this way, the first "L" frequency of this file is 1/10000, the second "M" frequency is 100/10000, and the last "L" frequency is 10000/10000. thanks for your help.

微信截图_20230726183913 test.zip

liyunjian commented 1 year ago

Hello, have you received this example, or what else needs to be provided. @atareen

atareen commented 1 year ago

Hello,

I took a quick look at your test.fasta file and wrote a notebook to visualize the low frequency amino acids. I can't attach the notebook here so I've pasted my minimal code at the end. Basically if I convert to a relative abundance or probability matrix, I have no issues with the really low frequency amino-acids.

The follow is a logo from your file in linear scale:

Screenshot 2023-08-21 at 10 08 00 AM

This is with y-scale logged (really low frequency M is visible)

Screenshot 2023-08-21 at 10 08 04 AM

Note that in the log scaled logo, the individual Glyph objects representing amino-acids are really stretched due to their high frequency, but these can be fixed via applying some stylizing to the Glyph objects, as shown here: https://github.com/jbkinney/logomaker/blob/master/logomaker/tutorials/6_glyph_objects.ipynb

I would suggest learning about the Glyph objects and play around with the functions alignment_to_matrix and it's different arguments. As far as I can, visualizing your data at really low frequencies should be totally possible with logomaker without changes to the underlying source, though it will need some advanced styling.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.ion()
import logomaker as lm
with open("test/test.fasta","r") as f:
    raw_seqs = f.readlines()

seqs = [seq.strip() for seq in raw_seqs if ('#' not in seq) and ('>') not in seq]
# preview sequences
print('There are %d sequences, all of length %d'%(len(seqs), len(seqs[0])))    
seqs_df = pd.DataFrame(seqs,columns=['sequence'])
print(seqs_df.value_counts())
# create a relative abundance matrix - can play around with pseudocount.
mat = lm.alignment_to_matrix(seqs,to_type='probability',pseudocount=0)
mat
# linear axis logo
fig, ax = plt.subplots(1,1,figsize=(12,3))
logo = lm.Logo(mat,ax=ax)
fig, ax = plt.subplots(1,1,figsize=(12,3))
logo = lm.Logo(mat,ax=ax)
ax.set_yscale('log')
#logo.style_glyphs_in_sequence('MMVFLVLLPL',color='blue')
ax.set_ylim(1e-3,1.1)
ax.set_title('Should be restyled with glyphs')
liyunjian commented 1 year ago

Hello, thank you for your help, I probably understand, let's learn.