robjohnnoble / ggmuller

Create Muller Plots of Evolutionary Dynamics
Other
64 stars 9 forks source link

Automatically filling in zeros interacts badly with coloring by fitness #10

Closed emilydolson closed 5 years ago

emilydolson commented 5 years ago

Thank you so much for writing this library! It's so useful!

It looks like there is a slight issue that can occur when ggmuller is left to fill in zeros for populations at various time points and then you ask to color by fitness (or any other supplemental column). When the zeros for population get filled in, the other columns do not. As a result there are rows that end up with the same Group_id but different fitness values. Then, when you attempt to color by fitness, ggplot throws an error (error in f(...) : aesthetics can not vary with a ribbon) because it can't give the same geom_area two different colors.

I fixed this by manually modifying the Fitness column in the data frame produced by get_Muller_df():

library(dplyr)
library(ggmuller)

# Make Muller dataframe
Muller_df <- get_Muller_df(adjacency_df, pop_info_df, cutoff = .2)

# Make dataframe containing the correct fitness for each identity
# (they should all be the same so it doesn't matter if this is max() or some other function)
fits <- Muller_df %>% group_by(Identity) %>% summarize(Fitness = max(Fitness, na.rm = TRUE))

# Drop the fitness column from the original dataframe and replace it with the one from fits
correct_df <- left_join(Muller_df %>% select(-one_of("Fitness")), fits)

# Now this works
Muller_plot(correct_df, colour_by = "Fitness")

I'm sure there's a better way (I don't do much heavy-duty coding in R), but, in case anyone else is having this problem, this solution works, assuming a taxon's fitness isn't allowed to change over time (which seems like a necessary assumption for any of this to work).

I've run into a couple of other problems (one with additional time points getting created by get_Muller_df and one with ggplot complaining about a Continuous value supplied to discrete scale when I use a continuous value for Fitness`) but I've been able to work around them and haven't had a chance to narrow down the cause or come up with a minimal reproducible example. Will update if I learn more (I'm also happy to post the code/data that's doing it, but the data are kind of unweildy).

Thanks again for making this available!

robjohnnoble commented 5 years ago

Dear Emily,

Thanks very much for getting in touch! I’m glad you’re finding the package helpful.

I’ll take a look at these issues as soon as possible and notify you of changes.

Kind regards, Rob.

On 22 Nov 2018, at 04:33, Emily Dolson notifications@github.com wrote:

Thank you so much for writing this library! It's so useful!

It looks like there is a slight issue that can occur when ggmuller is left to fill in zeros for populations at various time points and then you ask to color by fitness (or any other supplemental column). When the zeros for population get filled in, the other columns do not. As a result there are rows that end up with the same Group_id but different fitness values. Then, when you attempt to color by fitness, ggplot throws an error (error in f(...) : aesthetics can not vary with a ribbon) because it can't give the same geom_area two different colors.

I fixed this by manually modifying the Fitness column in the data frame produced by get_Muller_df():

library(dplyr) library(ggmuller)

Make Muller dataframe

Muller_df <- get_Muller_df(adjacency_df, pop_info_df, cutoff = .2)

Make dataframe containing the correct fitness for each identity

(they should all be the same so it doesn't matter if this is max() or some other function)

fits <- Muller_df %>% group_by(Identity) %>% summarize(Fitness = max(Fitness, na.rm = TRUE))

Drop the fitness column from the original dataframe and replace it with the one from fits

correct_df <- left_join(Muller_df %>% select(-one_of("Fitness")), fits)

Now this works

Muller_plot(correct_df, colour_by = "Fitness") I'm sure there's a better way (I don't do much heavy-duty coding in R), but, in case anyone else is having this problem, this solution works. Assuming a taxon's fitness isn't allowed to change over time (which seems like a necessary assumption for any of this to work).

I've run into a couple of other problems (one with additional time points getting created by get_Muller_df and one with ggplot complaining about a Continuous value supplied to discrete scale when I use a continuous value for Fitness`) but I've been able to work around them and haven't had a chance to narrow down the cause or come up with a minimal reproducible example. Will update if I learn more (I'm also happy to post the code/data that's doing it, but the data are kind of unweildy).

Thanks again for making this available!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/robjohnnoble/ggmuller/issues/10, or mute the thread https://github.com/notifications/unsubscribe-auth/AGOHHSI4FAsctjCAc1QWWXExL6LMgFbuks5uxhsSgaJpZM4Yum0V.

robjohnnoble commented 5 years ago

Here's an example of the problem:

library(ggmuller)
library(dplyr)
example_pop_df_mod <- example_pop_df
example_pop_df_mod$Fitness <- example_pop_df_mod$Identity/10
example_pop_df_mod <- filter(example_pop_df_mod, Population > 0)
Muller_df <- get_Muller_df(example_edges, example_pop_df_mod)
Muller_plot(Muller_df, colour_by = "Fitness")

I'm working on a solution.

robjohnnoble commented 5 years ago

Fixed in commit c84383d8859735cf756c61efde7c8f0b84f5e4a1

emilydolson commented 5 years ago

Fantastic! Thank you.

robjohnnoble commented 5 years ago

No problem. Thanks for reporting the issue. Please let me know if you spot any other opportunities for improvement.