rlbarter / superheat

An r package for generating beautiful and customizable heatmaps
https://rlbarter.github.io/superheat/
235 stars 29 forks source link

basic question superheat dataframe requirements #44

Open cfrederica opened 5 years ago

cfrederica commented 5 years ago

For a start I'm trying to plot a simple heatmap. First column in dataframe are sites, the rest species with abundance values. I get the error X must be numeric but don't know how to integrate the site names which I need for ordering the data. If I make the first column numeric I loose the site information. With the mtcars example data for some reason this problem doesn't occurr. Thanks for your help!

Below the structure of my dataframe > str(test10) 'data.frame': 57 obs. of 15 variables: $ X : Factor w/ 9 levels "MLALR","MLCCR",..: 3 6 5 3 4 7 2 9 2 7 ... $ A1 : num 0 0 0 0 0 0 0 0 0 0 ... $ A2 : num 0 0 0 0 0 0 0 0 0 0 ... $ A3 : num 0 0 2 0 0 0 0 0 0 0 ... $ A4 : num 3 0 5 0 52 ... $ A5 : num 0 0 0 0 0 0 0 0 0 0 ... $ A6 : num 0 0 0 0 0 0 0 0 0 0 ... $ A7 : num 0 0 0 0 0 0 0 0 22 0 ... $ A8 : num 0 0 0 0 0 0 0 0 0 0 ... $ A9 : num 0 0 0 0 0 0 5 458 0 0 ... $ A10: num 0 0 0 0 0 0 0 0 0 0 ... $ A11: num 0 1757 0 0 0 ... $ A12: num 0 0 0 0 0 0 0 0 0 0 ... $ A13: num 24499 8785 7267 19885 69 ... $ A14: num 19 0 0 0 0 0 0 0 0 0 ...

kaarg2 commented 5 years ago

Have you tried running as.data.frame before running superheat? df <- as.data.frame(df); df being the name of your matrix. Alex

On Oct 6, 2018, at 8:18 PM, Frederica1 notifications@github.com wrote:

For a start I'm trying to plot a simple heatmap. First column in dataframe are sites, the rest species with abundance values. I get the error X must be numeric but don't know how to integrate the site names which I need for ordering the data. If I make the first column numeric I loose the site information. With the mtcars example data for some reason this problem doesn't occurr. Thanks for your help!

Below the structure of my dataframe

str(test10) 'data.frame': 57 obs. of 15 variables: $ X : Factor w/ 9 levels "MLALR","MLCCR",..: 3 6 5 3 4 7 2 9 2 7 ... $ A1 : num 0 0 0 0 0 0 0 0 0 0 ... $ A2 : num 0 0 0 0 0 0 0 0 0 0 ... $ A3 : num 0 0 2 0 0 0 0 0 0 0 ... $ A4 : num 3 0 5 0 52 ... $ A5 : num 0 0 0 0 0 0 0 0 0 0 ... $ A6 : num 0 0 0 0 0 0 0 0 0 0 ... $ A7 : num 0 0 0 0 0 0 0 0 22 0 ... $ A8 : num 0 0 0 0 0 0 0 0 0 0 ... $ A9 : num 0 0 0 0 0 0 5 458 0 0 ... $ A10: num 0 0 0 0 0 0 0 0 0 0 ... $ A11: num 0 1757 0 0 0 ... $ A12: num 0 0 0 0 0 0 0 0 0 0 ... $ A13: num 24499 8785 7267 19885 69 ... $ A14: num 19 0 0 0 0 0 0 0 0 0 ...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rlbarter/superheat/issues/44, or mute the thread https://github.com/notifications/unsubscribe-auth/AMMNk6DPILHFSQJAz1HbJMDBUJZxeh0Qks5uiVZkgaJpZM4XLoIz.

cfrederica commented 5 years ago

Hi Alex, thanks for your reply! When I class it it says it's a dataframe. I just ran as.data.frame anyway but it doesn't do anything.

kaarg2 commented 5 years ago

To include only numeric data, you can use the as.matrix function. df <- as.matrix(df); df being the name of the data frame. Yun Zhang (Alex)

On Oct 6, 2018, at 8:18 PM, Frederica1 notifications@github.com wrote:

For a start I'm trying to plot a simple heatmap. First column in dataframe are sites, the rest species with abundance values. I get the error X must be numeric but don't know how to integrate the site names which I need for ordering the data. If I make the first column numeric I loose the site information. With the mtcars example data for some reason this problem doesn't occurr. Thanks for your help!

Below the structure of my dataframe

str(test10) 'data.frame': 57 obs. of 15 variables: $ X : Factor w/ 9 levels "MLALR","MLCCR",..: 3 6 5 3 4 7 2 9 2 7 ... $ A1 : num 0 0 0 0 0 0 0 0 0 0 ... $ A2 : num 0 0 0 0 0 0 0 0 0 0 ... $ A3 : num 0 0 2 0 0 0 0 0 0 0 ... $ A4 : num 3 0 5 0 52 ... $ A5 : num 0 0 0 0 0 0 0 0 0 0 ... $ A6 : num 0 0 0 0 0 0 0 0 0 0 ... $ A7 : num 0 0 0 0 0 0 0 0 22 0 ... $ A8 : num 0 0 0 0 0 0 0 0 0 0 ... $ A9 : num 0 0 0 0 0 0 5 458 0 0 ... $ A10: num 0 0 0 0 0 0 0 0 0 0 ... $ A11: num 0 1757 0 0 0 ... $ A12: num 0 0 0 0 0 0 0 0 0 0 ... $ A13: num 24499 8785 7267 19885 69 ... $ A14: num 19 0 0 0 0 0 0 0 0 0 ...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rlbarter/superheat/issues/44, or mute the thread https://github.com/notifications/unsubscribe-auth/AMMNk6DPILHFSQJAz1HbJMDBUJZxeh0Qks5uiVZkgaJpZM4XLoIz.

cfrederica commented 5 years ago

I tried that before but then I don't have the site information anymore...

cfrederica commented 5 years ago

I'm sure it's something basic but I'm not very experienced with R

kaarg2 commented 5 years ago

I think you also need to construct a separate vector for the labels, in case you want to color code the label. When you import the data, did you check the header option to be true, as well as the rownames = 1 (specifying 1st column are row names)? For example df <- read.csv(“df.csv”, header = TRUE, row.names = 1) Yun Zhang (Alex)

On Oct 6, 2018, at 10:20 PM, Frederica1 notifications@github.com wrote:

I tried that before but then I don't have the site information anymore...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rlbarter/superheat/issues/44#issuecomment-427622602, or mute the thread https://github.com/notifications/unsubscribe-auth/AMMNkyaz7sQGG65PMdodzjFvJAJ_idAoks5uiXL1gaJpZM4XLoIz.

kaarg2 commented 5 years ago

You might want to specify the row names and headers when you import the data. df <- read.csv(“df.csv”, header = TRUE, row.names = 1); specifying the first column are row names. alex

On Oct 6, 2018, at 10:21 PM, Frederica1 notifications@github.com wrote:

I'm sure it's something basic but I'm not very experienced with R

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rlbarter/superheat/issues/44#issuecomment-427622628, or mute the thread https://github.com/notifications/unsubscribe-auth/AMMNkxGa9GOLpXc26JPookGaKa4BC5ccks5uiXMegaJpZM4XLoIz.

cfrederica commented 5 years ago

I tried something like this: superheat(test10[ ,c(2:15)], yr = test10[ ,1], yr.axis.name = "Sites") Returns error message: Error in Summary.factor(c(3L, 6L, 5L, 3L, 4L, 7L, 2L, 9L, 2L, 7L, 8L, : ‘min’ not meaningful for factors

I will try what you suggest above. Thanks a lot! Frederica

kaarg2 commented 5 years ago

I don’t think the function you described draw a heat map. You might want to refer to the example below. https://rlbarter.github.io/superheat-examples/Organ/ Yun Zhang (Alex)

On Oct 6, 2018, at 10:29 PM, Frederica1 notifications@github.com wrote:

I tried something like this: superheat(test10[ ,c(2:15)], yr = test10[ ,1], yr.axis.name = "Sites") Returns error message: Error in Summary.factor(c(3L, 6L, 5L, 3L, 4L, 7L, 2L, 9L, 2L, 7L, 8L, : ‘min’ not meaningful for factors

I will try what you suggest above. Thanks a lot! Frederica

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rlbarter/superheat/issues/44#issuecomment-427622880, or mute the thread https://github.com/notifications/unsubscribe-auth/AMMNk0i4IXOkcG9vg1lE1AiALzViiBSVks5uiXUCgaJpZM4XLoIz.

cfrederica commented 5 years ago

Ok, the row.names = 1 argument did the trick! I can now run superheat. Now a new problem occurred. I have a column with samples ID's and a column with site names. I eventually want to order the data by site. Now it seems not to accept repeated site names giving error message when loading dataframe:

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

cfrederica commented 5 years ago

I can load the data if I include both columns (Sample and Site) but then I get the X must be numeric error again when running superheat. The data looks like this:

Sample Site A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 BCS19-10-1_ML1926 MLPBL 0 0 0 3 0 0 0 0 0 0 0 0 24499 19 BCS19-10-2_ML1950 MLRNW 0 0 0 0 0 0 0 0 0 0 1757 0 8785 0 BCS19-10-3_ML1974 MLPST 0 0 2 5 0 0 0 0 0 0 0 0 7267 0 BCS19-10-4_ML1998 MLPBL 0 0 0 0 0 0 0 0 0 0 0 0 19885 0 BCS19-10-5_ML2022 MLPPR 0 0 0 52 0 0 0 0 0 0 0 0 69 0 BCS19-10-6_ML2046 MLROL 0 0 0 9 0 0 0 0 0 0 716 0 12 0 BCS19-10-7_ML2070 MLCCR 0 0 0 950 0 0 0 0 5 0 0 0 18 0 BCS19-11-2_ML1951 MLSIS 0 0 0 0 0 0 0 0 458 0 405 0 30416 0 BCS19-11-3_ML1975 MLCCR 0 0 0 5279 0 0 22 0 0 0 0 0 5 0

kaarg2 commented 5 years ago

You can try remove the site column but preserve the sample ID. then color code the sample based on the site. Treating site as a factor variable. You will need to construct a vector specifying the color for each site. Unsupervised clustering the samples can be performed with superheat. And you visualize the homogeneity based on the color of the labels as well as the dendrograms.

Sent from my iPad

On Oct 6, 2018, at 10:51 PM, Frederica1 notifications@github.com wrote:

I can load the data if I include both columns (Sample and Site) but then I get the X must be numeric error again when running superheat. The data looks like this:

Sample Site A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 BCS19-10-1_ML1926 MLPBL 0 0 0 3 0 0 0 0 0 0 0 0 24499 19 BCS19-10-2_ML1950 MLRNW 0 0 0 0 0 0 0 0 0 0 1757 0 8785 0 BCS19-10-3_ML1974 MLPST 0 0 2 5 0 0 0 0 0 0 0 0 7267 0 BCS19-10-4_ML1998 MLPBL 0 0 0 0 0 0 0 0 0 0 0 0 19885 0 BCS19-10-5_ML2022 MLPPR 0 0 0 52 0 0 0 0 0 0 0 0 69 0 BCS19-10-6_ML2046 MLROL 0 0 0 9 0 0 0 0 0 0 716 0 12 0 BCS19-10-7_ML2070 MLCCR 0 0 0 950 0 0 0 0 5 0 0 0 18 0 BCS19-11-2_ML1951 MLSIS 0 0 0 0 0 0 0 0 458 0 405 0 30416 0 BCS19-11-3_ML1975 MLCCR 0 0 0 5279 0 0 22 0 0 0 0 0 5 0 BCS19-11-4_ML1999 MLROL 0 0 0 0 0 0 0 0 0 0 0 0 14452 0 BCS19-11-5_ML2023 MLSCR 0 0 0 8866 0 0 0 0 0 0 0 0 0 0 BCS19-11-6_ML2047 MLRNW 0 0 0 5 129 0 0 0 0 0 474 0 13641 2 BCS19-11-7_ML2071 MLPBL 0 2220 0 4 0 0 0 0 0 0 10 0 19034 0 BCS19-12-1_ML1928 MLPST 0 0 0 6 0 0 0 0 0 0 0 0 11799 0 BCS19-12-3_ML1976 MLPPR 0 0 0 1423 0 0 0 0 0 0 34 0 6381 0 BCS19-12-4_ML2000 MLSIS 0 0 0 0 0 0 0 0 0 0 0 0 10275 0 BCS19-12-5_ML2024 MLSCR 0 0 0 58815 0 0 0 0 0 16 0 0 17 0 BCS19-12-7_ML2072 MLPBL 2 0 0 2 0 0 0 0 0 0 34 0 6376 0 BCS19-1-2_ML1941 MLRNW 0 0 0 0 0 0 0 0 0 0 4655 0 24168 25 BCS19-13-1_ML1929 MLRNW 0 130 0 4 0 0 0 0 53 0 62 0 36552 0 BCS19-13-2_ML1953 MLPBL 0 0 0 0 0 0 0 0 0 0 0 0 15329 0 BCS19-13-3_ML1977 MLPST 0 0 0 8 0 0 0 0 0 0 0 0 18611 0 BCS19-13-4_ML2001 MLCCR 0 0 0 716 0 0 0 0 0 0 32 0 115 0 BCS19-13-5_ML2025 MLROL 0 0 0 0 0 0 18 0 0 0 505 4 45 0 BCS19-13-6_ML2049 MLPBL 0 0 0 7 0 0 0 0 0 0 0 0 5834 36 BCS19-13-7_ML2073 MLALR 0 0 0 0 0 0 0 101 0 0 3 0 939 0 BCS19-1-3_ML1965 MLPPR 0 0 0 115 0 0 0 0 0 0 0 0 6912 0 BCS19-14-1_ML1930 MLROL 0 444 0 0 0 0 3 0 0 0 435 0 859 0 BCS19-14-2_ML1954 MLRNW 0 0 0 0 0 0 0 0 0 0 10 0 13189 0 BCS19-14-3_ML1978 MLPST 0 0 0 0 0 0 0 0 0 2 0 0 10047 0 BCS19-14-4_ML2002 MLPPR 0 0 0 0 0 0 0 0 0 0 0 0 3437 0 BCS19-14-5_ML2026 MLPBL 0 0 0 23 0 0 0 0 0 0 2531 0 17367 0 BCS19-14-6_ML2050 MLRNW 0 0 0 0 0 5 0 0 0 0 276 0 8610 0 BCS19-14-7_ML2074 MLSCR 0 0 0 9627 0 0 0 0 0 209 0 0 14 0 BCS19-1-4_ML1989 MLPST 0 0 0 0 0 0 0 0 0 0 0 0 9137 0 BCS19-15-1_ML1931 MLSIS 0 0 0 0 116 0 0 0 0 0 1209 0 9166 0 BCS19-15-2_ML1955 MLPPR 0 0 0 2783 0 0 0 0 44 0 1028 0 130 0 BCS19-15-3_ML1979 MLRNW 0 0 0 0 0 195 4 0 0 0 871 0 9800 0 BCS19-15-4_ML2003 MLSIS 17 0 0 2 0 0 0 0 0 0 0 0 4105 0 BCS19-15-5_ML2027 MLPST 0 0 0 4 0 0 0 0 0 0 0 0 51554 0 BCS19-15-6_ML2051 MLSIS 0 0 0 0 0 0 27 0 0 0 3852 0 327 0 BCS19-15-7_ML2075 MLALR 0 0 0 0 0 0 0 0 0 0 7883 0 11264 0 BCS19-1-5_ML2013 MLPPR 0 0 0 4 0 0 0 0 0 0 34 0 12172 0 BCS19-16-1_ML1932 MLPPR 0 0 0 13 0 0 0 25 0 0 4 0 14993 0 BCS19-16-2_ML1956 MLPPR 0 0 0 952 0 0 0 0 0 0 7 0 14072 0 BCS19-16-4_ML2004 MLCCR 0 0 0 274 0 0 0 0 0 0 116 0 29 0 BCS19-16-5_ML2028 MLPPR 0 0 0 9 0 0 0 0 0 0 4 0 55948 0 BCS19-16-6_ML2052 MLPBL 0 0 0 0 0 0 0 0 0 13 2 0 21131 0 BCS19-16-7_ML2076 MLSIS 0 0 0 2 0 0 0 0 0 0 888 0 33905 0 BCS19-1-6_ML2037 MLCCR 0 0 0 9956 0 0 0 0 0 0 0 12 877 0 BCS19-17-1_ML1933 MLALR 0 0 0 0 0 0 0 0 0 0 1029 0 6271 108 BCS19-17-2_ML1957 MLCCR 0 0 0 3277 0 0 0 0 0 0 0 0 26 0 BCS19-17-3_ML1981 MLPST 0 0 0 0 0 0 0 0 0 0 0 0 11497 0 BCS19-17-4_ML2005 MLPST 0 0 0 0 0 0 0 0 0 0 0 0 5495 0 BCS19-17-6_ML2053 MLRNW 0 0 0 0 0 28 0 0 0 0 0 0 27157 0 BCS19-17-7_ML2077 MLRNW 0 0 0 0 0 0 9 0 44 0 0 0 12542 0 BCS19-1-7_ML2061 MLPST 0 0 0 0 0 0 0 0 0 0 0 0 12975 0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.