jbkinney / logomaker

Software for the visualization of sequence-function relationships
MIT License
186 stars 37 forks source link

problems using logomaker with protein sequence alphabet #20

Closed FraenzeMueller closed 2 years ago

FraenzeMueller commented 2 years ago

Dear Logomaker-Team,

I am using logomaker to create logo plots to visualize amino acids that are crosslinked within a protein, but I encounter a problem doing so. I was following your toturial and created a matrix that looked exactly like in the example. It works if I use just the four letter code for RNA/DNA but if I switch to protein sequence alphabet it gives an error that the dataframe contains not finite values. This is not the case in my dataframe. Than I used the allow_nan=True argument as indicated by the error message but it is still not working. Is there some hint when working with protein sequences? Cause so far I tried everything mentioned in the dokumentation.

code: dataframe: columns protein sequence alphabet (single str), rows ints, values= pos finite values df = pd.read_csv("sequence_logoplot_test.csv", sep=';', index_col=0) logomaker.Logo(df, color_scheme="hydrophobicity",alphabet="protein")

Maybe I am missing something?

Best wishes

jbkinney commented 2 years ago

Hi Fraenze,

Can you send us the specific csv file you’re using?

Cheers, -Justin

Justin B. Kinney Associate Professor Simons Center for Quantitative Biology Cold Spring Harbor Laboratory @.**@.> 631-897-5731

On Mar 7, 2022, at 6:37 AM, FraenzeMueller @.**@.>> wrote:

Dear Logomaker-Team,

I am using logomaker to create logo plots to visualize amino acids that are crosslinked within a protein, but I encounter a problem doing so. I was following your toturial and created a matrix that looked exactly like in the example. It works if I use just the four letter code for RNA/DNA but if I switch to protein sequence alphabet it gives an error that the dataframe contains not finite values. This is not the case in my dataframe. Than I used the allow_nan=True argument as indicated by the error message but it is still not working. Is there some hint when working with protein sequences? Cause so far I tried everything mentioned in the dokumentation.

code: dataframe: columns protein sequence alphabet (single str), rows ints, values= pos finite values df = pd.read_csv("sequence_logoplot_test.csv", sep=';', index_col=0) logomaker.Logo(df, color_scheme="hydrophobicity",alphabet="protein")

Maybe I am missing something?

Best wishes

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jbkinney_logomaker_issues_20&d=DwMCaQ&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=wmepLCX4OJO9yXP9iCfi9g&m=Vx_EzelGBC8bkgSl0WeXy_TiGF4r_iPB3cEfhkLqqkc&s=o3dfC9ZLJZ8pSktEy2q0UFqN1jTbsy5-q-VGV8GKXTI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABXOC3HIJFSF5Z6ETSTZBHDU6XS63ANCNFSM5QDA3FWQ&d=DwMCaQ&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=wmepLCX4OJO9yXP9iCfi9g&m=Vx_EzelGBC8bkgSl0WeXy_TiGF4r_iPB3cEfhkLqqkc&s=vZbaQ1yTN7UB0bWudZdYAdfRnPykAhqkJSEdRnOIx_A&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

FraenzeMueller commented 2 years ago

Hi Justin,

This is the raw file: sequence_logoplot_test.csv

I tried several things, replace 0 with other values, converting values, deleting columns and so on. Nothing seems to work.

Best wishes, Fraenze

atareen commented 2 years ago

I took a brief look at your data, the issue is with your data formatting. When I read in your data, the pandas dataframe looks like the first image below. As an example, for position 1 and character A, the value of the cell is '0,210453'; the comma here is an issue. If this value is supposed to 0.210453, then the comma should be replace by a decimal point. Moreover, elements of the pandas dataframe that should be entered into logomaker should be numerical type; in the way you have read your pandas dataframe, the elements appear to be of type string (second image).

Screen Shot 2022-03-07 at 1 03 48 PM Screen Shot 2022-03-07 at 1 11 08 PM

As a reference, I am attaching an protein example which works with logomaker just fine

Screen Shot 2022-03-07 at 1 06 48 PM
FraenzeMueller commented 2 years ago

Hi,

thank you for your reply. After changing the value formatting settings it works. The comma was the problem. thank you for your help.

Best wishes