Closed jamesbaye closed 4 years ago
Latest commit addresses issue #7
EpiMutations
now produces a consistent tibble regardless of the choice of method
.
When reduced_output=T
, tibble columns are ("sample", "chr", "start", "end", "cpg_ids", "outlier_method", "outlier_score")
. When reduced_output=F
, extra columns provided by bumphunter are also carried over, including ("value", "area")
which could be relevant for extra stats?
Updated docstring @return
to precisely document the returned tibble object.
Good work @jamesbaye! I think that the reduced output was a great idea. I have checked your code and I have some comments:
value
and area
are measures of how consistent the DMR is and evaluate bumphunter results. In our implementation, we are using bumphunter to partition the genome and an outlier test to call the significance of the outlier. Thus, the statistics we are relying on the statistics that come from the outlier method and not by bumphunter. I do not think that value
and area
will help.Thanks for the review @yocra3.
Re value/area
: Are you suggesting that I remove the functionality of reduced=T
? I thought it might be good that the user can check bumphunter statistics if they desired (reduced=F
). Though the default (reduced=T
) would not output them.
Outlier tests results: Okay, yes you're spot on. I'd just seen that yesterday. For now am I correct that each function still just reports one output? So the current reporting would work for now?
@yocra3 , I think it makes sense to have the reduce_output
argument to allow obtain the measurements form bumphunter since we will include methods that do not use it. Thereforem, for those that use it as partition algorithm it may be usefull to get its statistics (and for the methods that do not use it we may or may not report equivalent statistics, for instance, Berbosa .et al. will not report any statistics for the area partition process).
Therfore, I think:
reduce_output
(bu default set to FALSE
) with the area
and value
(with some prefix bump_
)Output: Leire is working on MANOVA and this function exports two parameters. Maybe, you can talk with her so you can integrate her function with yours.
Addressing issue #6
Added a
cpg_ids
column in tibble output.If
reduced_output = T
, subset columns to("sample", "chr", "start", "end", "cpg_ids")
.I didn't include
length
orn_cpgs
since they are redundant. Didn't includeepi_id
since this should probably be added when all samples have been processed.