Closed cdeitrick closed 5 years ago
Thanks for reporting this issue. I'll take a look as soon as I can, though it might take a bit longer than usual as I'm about to leave for a week-long conference. My initial guess is that the problem isn't due to the number of timepoints, or else I would have encountered it with my own data.
The problem seems to be due to the add_start_points
function, which is called by get_Muller_df
. If I comment-out the line that calls add_start_points
then the error disappears.
I've yet to figure out exactly why this happens with your data but I don't think it's due to "large datasets". Rather, I suspect it might be because some population sizes change from positive to zero and then back to positive -- a behaviour I didn't anticipate when I wrote the code.
It's definitely not due to the number of time points because the error occurs after I filter the data to just two time points (using dplyr): population <- filter(population, Generation %in% c(0, 10000))
.
Until I figure out the exact cause, I suggest you modify the get_Muller_df
by commenting-out the following line: pop_df <- add_start_points_alt(pop_df, start_positions)
. Then it should work.
It turns out the error occurs only for population data frames with a very particular characteristic (one or more new populations appear at exactly generation 10,000). The bug was due to how the "add_start_points" function adds new rows to the population data frame. I've fixed it with commit 2a68df7916d500de9020868b6817acea0578cf40.
The
get_Muller_df
function fails when given a dataset with a very large number of timepoints, such as a population from the Long Term Evolution experiment. The error reported when using a population with ~170 timepoints is:Removing most of the timepoints allows the script to work again, while removing mutations from the source file has no affect. I have also checked for duplicate datapoints in the source files (attached) but found none.
m5_correct.ggmuller.edges.txt m5_correct.ggmuller.populations.txt