Closed thijsjanzen closed 4 years ago
I think it is fine to replace the error_function
, so I've taken the first steps. Until I've hit this:
The figure shows that the output of pir_run
is a data frame with as much rows as there are experiments. There are a lot of tests and functions (e.g. pir_plot
) that depend on this output, so it will be quite some work.
We can add another column called 'error_fun' per error function, meaning that pir_run
returns a data frame with as much rows as error comparisons are done.
A test I've added, to show how I would enjoy the interface to be:
test_that("allow to add more errors", {
skip("Issue 371, Issue $371")
# Use 'error_funs' instead of 'error_fun', to allow for more error functions
expect_silent(
create_error_measure_params(
error_funs = get_nltt_error_fun()
)
)
expect_silent(
create_error_measure_params(
error_funs = list(get_nltt_error_fun(), get_nltt_error_fun())
)
)
})
I think it is fine to replace the
error_function
, so I've taken the first steps. Until I've hit this:The figure shows that the output of
pir_run
is a data frame with as much rows as there are experiments. There are a lot of tests and functions (e.g.pir_plot
) that depend on this output, so it will be quite some work.We can add another column called 'error_fun' per error function, meaning that
pir_run
returns a data frame with as much rows as error comparisons are done.
- @thijsjanzen: I think it can be done, but it will take some work. I assume you volunteer?
- @Giappo: what do you think about all this?
I'm happy to give it a shot. As for the output frame, in the tidy format, this could be a table with the following columns: experiment, tree, inference model, inference_model_weight, site_model, clock_model, tree_prior, error_fun, error_val
Tracking down the functions depending on the non-tidy setup will be an interesting task.
Cool, if @Giappo agrees, then I'd say you go hit it :+1:
I agree having tidy data is better, but I suggest to make this another Issue. For now, I hope you use the wide form, also the refactoring for all dozens of packages depending on pirouette
will be too complex. Can you agree on that?
Cool, if @Giappo agrees, then I'd say you go hit it 👍
I agree having tidy data is better, but I suggest to make this another Issue. For now, I hope you use the wide form, also the refactoring for all dozens of packages depending on
pirouette
will be too complex. Can you agree on that?
I see your point, but I have no idea how one would analyze a dataframe in the wide format, given multiple error functions, unless you parse the column names, which is very ugly.
But I'll give it a try next week.
Nah, we convert the data frame to long format in pir_plot
, see here. It's three lines of code. I guess one can easily modify that :+1:
Hi @thijsjanzen, just to write down my thoughts explicitly:
To keep the current tests passing, I suggest the following return value of pir_run
:
tree |
inference_model |
... | error_1 |
... |
---|---|---|---|---|
... | ... | ... | ... | ... |
tree |
inference_model |
error_fun_name |
... | error_1 |
... |
---|---|---|---|---|---|
... | ... | ... | ... | ... | ... |
As said before, converting to tidy data is easy, especially for l33t like us :sunglasses:
I don't really understand how you would make that tidy. I would imagine something like:
[tree, inference_model, fun_1_error_1, fun_1_error_2, fun_1_error_3, fun_2_error_1, fun_2_error_2, fun_2_error_3 ....]
then:
tidy_data <- un_tidy_data %>%
gather(key = "error_fun", value = "error", -c(tree, inference_model) ) %>%
mutate( # some awful string parsing function that transforms fun_1_error_1 to the name of fun_1 )
Sure, that will also work. I predict my version will be easier to implement, but I do suggest you try the implementation you feel most comfortable with.
Good luck :+1:
Will take over for now.
@thijsjanzen: I am sorry for the noise, but I don't need this for now.
I will unassign myself. First to need this Issue done gets it :+1:
Hi @thijsjanzen, is this Issue still relevant? If no, I will close it :+1:
26 days of no response, sounds like a 'no' :+1:
Currently, pirouette supports the nLTT as error function. In an ideal world, the user could supply an arbitrary error function, or multiple error functions, and use these.
I imagine for instance the user supplying a list containing the used error functions, and pirouette then applying each function to the obtained posterior.
Obviously, the default should still be nLTT, to retain (backwards) compatibility