spholmes / F1000_workflow

43 stars 33 forks source link

creating a legend for the graph under creating and plotting graphs #26

Closed Mdrexel2018 closed 6 years ago

Mdrexel2018 commented 6 years ago

Hello! I was having a problem with this code a week or so ago and now I have a new and different problem. I am using the 'Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses'. I began with the DADA2 pipeline. I am trying to go through the section under creating and plotting graphs. I was able to get it working with the sample data from the paper. Now I am trying to get it to work with my own data. I am able to get a network but so far I am not able to tell which node belongs to which subject. This is the table of data that I am working with: samdf table

This is the code that I am using:

net <- make_network(ts, max.dist= 1) #was only able to get 4 points until I increased the max.dist. Then I was able to access all of the data. Once I get it working I will adjust this net sampledata <- data.frame(sample_data(ts)) V(net)$id <- sampledata[names(V(net)), "Subject"] # $__ means a vertex attribute. colrs <- c("gray50", "tomato", "gold", "red", "blue", "green", "pink", "teal", "orange", "yellow", "purple", "scarlet", "grey", "magenta") #this enabled me to both add more colors as well as increase the number of labels in the legend. the only thing is that the labels are labeled with the colors listed and do not appear to correspond with those colors (so if we change the names listed then the legend would change.) V(net)$color <- colrs[V(net)$id] V(net)$subject <- sampledata[names(V(net)), "Subject"] ggplot(net, aes(colour, x = x, y = y, xend = xend, yend = yend), layout = "fruchtermanreingold") + geom_edges(color = "darkgray") + geom_nodes(aes(colour = color), size = 3, vertex.label<- V(net)$Subject) + scale_shape_identity() + # not entirely sure what this does as the graph looks the same either way. theme(axis.text = element_blank(), axis.title = element_blank(), legend.key.height = unit(1.5,"line")) + guides(col = guide_legend(override.aes = list(size = 4)))

This is the resulting graph: g8 fa

The ideal graph would have the subject id in place of all of the color names. I don't think that I can type them out since I am still unsure of which node belongs to which id. I assume that I would have to have the program go into the data file and pull out the subject id and then input it into the legend but I am not sure how to do that.

I am also having another small problem. each time I run the code, the graph changes even if I had not changed the code. see the graphs above and below to see the changes. Because of this I am not sure if the graph that I am getting is a correct representation of the data. g9 fa g10 fa

Thank you :)

jfukuyama commented 6 years ago

Try changing this line

geom_nodes(aes(colour = color), size = 3, vertex.label<- V(net)$Subject)

to

geom_nodes(aes(colour = subject))

It's ok that the graph changes a bit every time you plot it; the positions of the nodes are not completely determined by the graph structure and the layout type, and so there's a bit of randomness. If you want to get rid of that, you can type set.seed(0) before running the plotting command and that should give you a consistent orientation.

Mdrexel2018 commented 6 years ago

@jfukuyama Thank you for getting back to me. The set.seed(0) worked well thank you. Then I tried changing the geom_nodes line to the one you suggested but once I did that the graph only showed two dots which had random numbers. g11 fa It appears that this part may be important since when I tried to run it without an error was generated:

vertex.label<- V(net)$Subject)

I tried changing it to either:

vertex.label<- V(net)$id) or vertex.label<- V(net)$color)

but then I got this error:

Error: data must be a data frame, or other object coercible by fortify(), not an integer vector

I am not sure where to go from there.

Mdrexel2018 commented 6 years ago

After messing with it some more it appears that the line vertex.label... does not make a difference. In fact it does not seem to be doing anything. Also I decided to print() the V(net)$subject and that gave me this list:

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14

So now I realize that the labels that I want in that vertex are not there. I am not sure how to get them into that list though.

jfukuyama commented 6 years ago

Hey there, sorry for the delayed reply, I've been without the internet for a while.

It's hard to debug this without having access to all of your data, but it looks like what happened with your V(net)$subject is that a variable that was initially a factor got coerced to an integer (which is how R stores factors internally). That makes the subject IDs go from meaningful to the integers that you get there. Then the plot is faithfully plotting the integers, which are unfortunately the wrong thing.

What you should do is go through all the assignments you used to create V(net)$subject and see where it switches over from being the good subject labels to being the numbers you see there. (So for instance, it was created with V(net)$subject <- sampledata[names(V(net)), "Subject"], so you should check that the sampledata data frame is good, that it has a column called Subject, that names(V(net)) match the row names of sampledata, and so on.)

Mdrexel2018 commented 6 years ago

No worries and thank you!

I checked sampledata, names(V(net)), and then sampledata[names(V(net)), "Subject"]. Everything was correct until I tried to print V(net)$subject. When I did that I got all only numbers, which leads me to believe that that is where the problem lies. For some reason that command is not doing what it is supposed to do. Your comment about V(net)$subject got me thinking so I was able to eventually figure it out, which I was very happy about. I decided to print subject which gave me:

print(subject) [1] ML04 MLK01 MLK02 MLK03 PP01 PP02 PP03 PP07 PP08 PP12 PP13 PP14 [13] PP15 PP16 14 Levels: ML04 MLK01 MLK02 MLK03 PP01 PP02 PP03 PP07 PP08 PP12 PP13 ... PP16

I realized then that the levels were what I wanted to have the legend equal to. Once I figured that out I looked at my previous code and realized that colrs was the current input for the legend via V(net)$color. So then I decided to try setting V(net)$color equal to the levels. Once I did that I was successful and able to get the correct graph. I also included the code that eventually worked for me.

g16 fa

net <- make_network(ts, max.dist= 1) #was only able to get 4 points until I increased the max.dist. Then I was able to access all of the data. net sampledata <- data.frame(sample_data(ts)) V(net)$subject <- sampledata[names(V(net)), "Subject"] V(net)$id <- levels(subject)

ggplot(net, aes(colour, x = x, y = y, xend = xend, yend = yend), layout = "fruchtermanreingold") + geom_edges(color = "darkgray") + geom_nodes(aes(colour = id), size = 3, vertex.label<- V(net)$Subject) + scale_shape_identity() + theme(axis.text = element_blank(), axis.title = element_blank(), legend.key.height = unit(1.5,"line")) + guides(col = guide_legend(override.aes = list(size = 4))) set.seed(0)

Thank you again! I couldn't have done it without your help.

jfukuyama commented 6 years ago

That's great, I'm so glad you figured it out!