Closed FraenzeMueller closed 2 years ago
Hi Fraenze,
Can you send us the specific csv file you’re using?
Cheers, -Justin
Justin B. Kinney Associate Professor Simons Center for Quantitative Biology Cold Spring Harbor Laboratory @.**@.> 631-897-5731
On Mar 7, 2022, at 6:37 AM, FraenzeMueller @.**@.>> wrote:
Dear Logomaker-Team,
I am using logomaker to create logo plots to visualize amino acids that are crosslinked within a protein, but I encounter a problem doing so. I was following your toturial and created a matrix that looked exactly like in the example. It works if I use just the four letter code for RNA/DNA but if I switch to protein sequence alphabet it gives an error that the dataframe contains not finite values. This is not the case in my dataframe. Than I used the allow_nan=True argument as indicated by the error message but it is still not working. Is there some hint when working with protein sequences? Cause so far I tried everything mentioned in the dokumentation.
code: dataframe: columns protein sequence alphabet (single str), rows ints, values= pos finite values df = pd.read_csv("sequence_logoplot_test.csv", sep=';', index_col=0) logomaker.Logo(df, color_scheme="hydrophobicity",alphabet="protein")
Maybe I am missing something?
Best wishes
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jbkinney_logomaker_issues_20&d=DwMCaQ&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=wmepLCX4OJO9yXP9iCfi9g&m=Vx_EzelGBC8bkgSl0WeXy_TiGF4r_iPB3cEfhkLqqkc&s=o3dfC9ZLJZ8pSktEy2q0UFqN1jTbsy5-q-VGV8GKXTI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABXOC3HIJFSF5Z6ETSTZBHDU6XS63ANCNFSM5QDA3FWQ&d=DwMCaQ&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=wmepLCX4OJO9yXP9iCfi9g&m=Vx_EzelGBC8bkgSl0WeXy_TiGF4r_iPB3cEfhkLqqkc&s=vZbaQ1yTN7UB0bWudZdYAdfRnPykAhqkJSEdRnOIx_A&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Justin,
This is the raw file: sequence_logoplot_test.csv
I tried several things, replace 0 with other values, converting values, deleting columns and so on. Nothing seems to work.
Best wishes, Fraenze
I took a brief look at your data, the issue is with your data formatting. When I read in your data, the pandas dataframe looks like the first image below. As an example, for position 1 and character A, the value of the cell is '0,210453'; the comma here is an issue. If this value is supposed to 0.210453, then the comma should be replace by a decimal point. Moreover, elements of the pandas dataframe that should be entered into logomaker should be numerical type; in the way you have read your pandas dataframe, the elements appear to be of type string (second image).
As a reference, I am attaching an protein example which works with logomaker just fine
Hi,
thank you for your reply. After changing the value formatting settings it works. The comma was the problem. thank you for your help.
Best wishes
Dear Logomaker-Team,
I am using logomaker to create logo plots to visualize amino acids that are crosslinked within a protein, but I encounter a problem doing so. I was following your toturial and created a matrix that looked exactly like in the example. It works if I use just the four letter code for RNA/DNA but if I switch to protein sequence alphabet it gives an error that the dataframe contains not finite values. This is not the case in my dataframe. Than I used the allow_nan=True argument as indicated by the error message but it is still not working. Is there some hint when working with protein sequences? Cause so far I tried everything mentioned in the dokumentation.
code: dataframe: columns protein sequence alphabet (single str), rows ints, values= pos finite values df = pd.read_csv("sequence_logoplot_test.csv", sep=';', index_col=0) logomaker.Logo(df, color_scheme="hydrophobicity",alphabet="protein")
Maybe I am missing something?
Best wishes