jkzorz / jkzorz.github.io

2 stars 0 forks source link

NDMS plot error #1

Open sumitra20 opened 2 years ago

sumitra20 commented 2 years ago

Hi @jkzorz ,

Im very new to bioinformatic and R and im trying to replicate the tutorial you posted on 'https://jkzorz.github.io/2019/06/06/NMDS.html' to get NDMS plot for my data. This is my sample dataframe:

"Level_1"   "Day"   "Yield" "Acidobacteria" "candidate division NC10"   "candidate division WOR-3"  "Candidatus Aminicenantes"  "Candidatus Eisenbacteria"  "Candidatus Rokubacteria"   "Calditrichaeota"   "Elusimicrobia" "Bacteroidetes" "Ignavibacteriae"   "candidate division Zixibacteria"   "Candidatus Latescibacteria"    "Gemmatimonadetes"  "Nitrospirae"   "Proteobacteria"    "Candidatus Abyssubacteria" "Candidatus Omnitrophica"   "Lentisphaerae" "Planctomycetes"    "Verrucomicrobia"   "Spirochaetes"  "Actinobacteria"    "Armatimonadetes"   "Chloroflexi"   "Cyanobacteria" "Deinococcus-Thermus"   "Firmicutes"
"R11P4_BS_diamond"  0   "Low"   22225.703   574.2619    0   140.88957   2322.9353   0   1029.0415   19.166956   3195.5298   256.63806   0.995686    0   1352.266    3617.2026   29759.188   0   28.37705    102.057816  4175.409    3554.35 1059.6588   5686.363    666.98517   35559.43    4108.5737   6.845341    2762.282
"R13P1_BS_diamond"  60  "High"  34864.668   296.10495   0   96.91555    970.3203    0   1011.2066   21.433247   3125.6428   382.53687   0.5824252   0   740.0294    8325.418    26701.049   0   43.33243    95.63422    2905.2532   5159.7046   1287.509    5290.517    720.45996   36515.844   3476.9617   7.105587    3380.0461
"R13P3_BS_diamond"  60  "High"  30511.857   201.64407   1252.1503   160.40663   1576.6329   0   0   20.618717   3181.9224   527.6994    0.17473489  0   618.5615    5305.126    27084.434   0   72.86445    81.251724   3752.0823   4092.466    2143.2983   6419.4106   869.3061    35032.426   3004.916    5.5915165   2518.454
"R13P6_BS_diamond"  60  "Low"   23491.105   201.67783   0   115.17029   2832.317    0   0   20.043161   2692.534    1185.2467   0   0   2077.0115   1735.1355   38256.37    0   1011.81616  99.385  4316.4453   4472.325    917.51984   9140.408    1159.6995   21273.375   3704.8694   7.061839    2208.4863
"R14P1_BS_diamond"  90  "High"  21952.639   196.00256   0   37.935978   1538.1635   0   0   0   3867.7136   1045.1714   0   0   924.8651    1710.6316   32768.605   0   0   0   3437.2456   2696.6157   1705.3628   14147.135   1296.3215   33106.695   3081.4202   11.24029    2023.2522
"R14P3_BS_diamond"  90  "High"  23845.86    189.5785    0   121.52756   1219.4237   0   0   19.843521   3474.5222   285.2086    0.3363309   0   1093.2996   4346.6284   32942.938   0   37.66906    97.19962    3969.1528   4942.8306   761.90155   9911.335    933.76666   34448.355   3229.0007   7.6235  2885.943

and the code:

library(vegan)
pc = read.csv("/media/combio7/8TB_Backup/sumitra_130122/all_season_megan_metadata/NDMS/transposed-bs-allseason-230922-taxonomy-phylum.csv", header= TRUE,sep ='\t')

#make community matrix - extract columns with abundance information
com = pc[,4:ncol(pc)]

#turn abundance data frame into a matrix
m_com = as.matrix(com)

set.seed(123)
nmds = metaMDS(m_com, distance = "bray")
nmds
plot(nmds)

#extract NMDS scores (x and y coordinates)
data.scores = as.data.frame(scores(nmds)$sites)

data.scores$Sample = pc$Sample
data.scores$Time = pc$Day
data.scores$Type = pc$Yield

head(data.scores)

library(ggplot2)

xx = ggplot(data.scores, aes(x = NMDS1, y = NMDS2)) + 
    geom_point(size = 4, aes( shape = Type, colour = Time))+ 
    theme(axis.text.y = element_text(colour = "black", size = 12, face = "bold"), 
    axis.text.x = element_text(colour = "black", face = "bold", size = 12), 
    legend.text = element_text(size = 12, face ="bold", colour ="black"), 
    legend.position = "right", axis.title.y = element_text(face = "bold", size = 14), 
    axis.title.x = element_text(face = "bold", size = 14, colour = "black"), 
    legend.title = element_text(size = 14, colour = "black", face = "bold"), 
    panel.background = element_blank(), panel.border = element_rect(colour = "black", fill = NA, size = 1.2),
    legend.key=element_blank()) + 
    labs(x = "NMDS1", colour = "Type", y = "NMDS2", shape = "Time")  + 
    scale_colour_manual(values = c("#009E73", "#E69F00"))  
xx

But i got an error msg that said 'Error: Continuous value supplied to discrete scale'

Then i tried editing by setting color and shape to factor:

xx = ggplot(data.scores, aes(x = NMDS1, y = NMDS2)) + 
    geom_point(size = 4, aes( shape = as.factor(Type), colour = as.factor(Time)))+ 
    theme(axis.text.y = element_text(colour = "black", size = 12, face = "bold"), 
    axis.text.x = element_text(colour = "black", face = "bold", size = 12), 
    legend.text = element_text(size = 12, face ="bold", colour ="black"), 
    legend.position = "right", axis.title.y = element_text(face = "bold", size = 14), 
    axis.title.x = element_text(face = "bold", size = 14, colour = "black"), 
    legend.title = element_text(size = 14, colour = "black", face = "bold"), 
    panel.background = element_blank(), panel.border = element_rect(colour = "black", fill = NA, size = 1.2),
    legend.key=element_blank()) + 
    labs(x = "NMDS1", colour = as.factor("Time"), y = "NMDS2", shape = as.factor("Type"))  + 
    scale_colour_manual(values = c("#009E73", "#E69F00")) 

Ended up with another error Error in f():! Insufficient values in manual scale. 4 needed but only 2 provided.

Not sure where im going wrong and on how to rectify it. Did really appreciate it if you could help. Thank you in advance

jkzorz commented 2 years ago

Hi! You would just need to add two more colours to scale_colour_manual(). Only 2 are listed now, and you would need 4 for your data.

e.g. scale_colour_manual(values = c("#009E73", "#E69F00", "red", "blue"))

sumitra20 commented 1 year ago

Hi @jkzorz,

My apologies, seem to have another question. I have 3 different groups (day, yield, season) that i would like to analyze, will i be able to plot more than 2 groups with NDMS plot? Not sure how to edit the line below:

data.scores$Sample = pc$Sample
data.scores$Time = pc$Day
data.scores$Type = pc$Yield
data.scores$Season = pc$Season

xx = ggplot(data.scores, aes(x = NMDS1, y = NMDS2)) + 
    geom_point(size = 4, aes( shape = as.factor(Type,Season), colour = as.factor(Time)))+ 
    theme(axis.text.y = element_text(colour = "black", size = 12, face = "bold"), 
    axis.text.x = element_text(colour = "black", face = "bold", size = 12), 
    legend.text = element_text(size = 12, face ="bold", colour ="black"), 
    legend.position = "right", axis.title.y = element_text(face = "bold", size = 14), 
    axis.title.x = element_text(face = "bold", size = 14, colour = "black"), 
    legend.title = element_text(size = 14, colour = "black", face = "bold"), 
    panel.background = element_blank(), panel.border = element_rect(colour = "black", fill = NA, size = 1.2),
    legend.key=element_blank()) + 
    labs(x = "NMDS1", colour = as.factor("Time"), y = "NMDS2", shape = as.factor("Type","Season"))  + 
    scale_colour_manual(values = c("#009E73", "#E69F00", "#F0E442","#AA4499")) 

This is the rror i get from running above:

Error in geom_point(): ! Problem while computing aesthetics. ℹ Error occurred in the 1st layer. Caused by error in as.factor(): ! unused argument (Season)

And if thats not possible, can i plot for each of my groups separately, one for yield (high vs low), day (0,30,60,90), season (1,2,3)?

Thank you so much in advance

jkzorz commented 1 year ago

Hi! Yes you can plot/differentiate all three groups on the same figure, you just need to map a different aesthetic to each group. So in the case above, "colour" is mapped to Time, and "shape" is mapped to Type. You need to map Season to a different aesthetic, for instance "size" or "alpha". e.g. geom_point(size = 4, aes( shape = as.factor(Type), size = Season, colour = as.factor(Time)))

Now whether or not this is something you want to do is another question. Trying to convey three different variables on the same plot can be difficult for the viewer. You may want to play around with what variable you map to which aesthetic.

You can plot each variable separately but you would have to create 3 different plots.

Hope this helps!