juliangehring / MultipleTesting.jl

The MultipleTesting package offers common algorithms for p-value adjustment and combination and more…
Other
38 stars 7 forks source link

method for missing? #108

Open pdimens opened 5 years ago

pdimens commented 5 years ago

Is there a simple(ish?) method to perform the correction but skip missing values, and output the corrected array with missing respecting their original indices (but not used in the calculations)?

Reading that back to myself, it doesn't feel like it's worded too clearly, so maybe an example:

julia> pvals = [0.001, 0.01, missing, 0.03, 0.5];

julia> adjust(pvals, Bonferroni())
4-element Array{Union{Missing,Float64},1}
 0.004
 0.04
missing
 0.12
 1.0
pdimens commented 5 years ago

Thinking about it some more, maybe something like a findall for the missing values, get an array of those indices, then omit the missing with skipmmissing(array) |> collect , calculate the correction, and finally re-add missing into the output array at the original indices with insert! ?

pdimens commented 5 years ago

Here is how I handled the situation in my own code. I don't know if it would merit adding to your package:

        # make a copy without the missing values
        p_no_miss = skipmissing(P_array) |> collect

        # get indices of where original missing are
        miss_idx = findall(i -> i === missing, P_array)

        # do the correction
        correct = adjust(p_no_miss, correctionmethod) |> Array{Any,1}

        # re-add missing to original positions
        for i in miss_idx
            insert!(correct, i, missing)
        end
juliangehring commented 5 years ago

Thanks for bringing up the handling of missing values. Your approach looks good to me, not sure if there is a more elegant way of removing and reinserting missing values exists. It is definitely worth exploring if missing values should better be handled by the adjust methods themselves.

juliangehring commented 5 years ago

@pdimens Just to understand your case a bit better: How did you generate the original p-values and why are some values missing?

pdimens commented 5 years ago

That's a pretty fair question. The p-values were generated with a chi-squared test. When performed on all the data, it works ok, but if the data is partitioned by group, some groups have a particular locus (genetics work) entirely missing, which I also didn't realize would have happened.

The actual code is here: https://github.com/pdimens/PopGen.jl/blob/master/src/HardyWeinberg.jl if the specific implementation matters.

juliangehring commented 5 years ago

Okay, thanks for the details - that is interesting to see.

pdimens commented 4 years ago

Having learned quite a bit since opening this issue, the PR submitted performs this a lot more elegantly than the code suggested above.