xKDR / Survey.jl

Analysis of complex surveys
https://xkdr.github.io/Survey.jl/
GNU General Public License v3.0
53 stars 19 forks source link

Changes in data structures to support more replicate methods #261

Closed smishr closed 1 year ago

smishr commented 1 year ago

variance estimation of a function using jackknife (say mean) is inherently different to that with Rao-Wu bootstrap

function mean(x::Symbol, design::ReplicateDesign)
    X = mean(design.data[!, x], weights(design.data[!, design.weights]))
    Xt = [
        mean(design.data[!, x], weights(design.data[!, "replicate_"*string(i)])) for
        i = 1:design.replicates
    ]
    variance = sum((Xt .- X) .^ 2) / design.replicates
    DataFrame(mean = X, SE = sqrt(variance))
end

Jackknife variance calculate multiplies the Sum of Square Errors with a different factor (summing sum((Xt .- X) .^ 2) by (nh-1) / nh over each strata, where nh is number of psus in strata).

So not ideal to keep signature design::ReplicateDesign. Instead I think variance estimation needs system which allows for differing variance calculation for the diff methods (eg Jackknife, BRR, Bootstrap)

smishr commented 1 year ago

Perhaps adding types for different type of variance estimation method, and catching that type using multiple dispatch?

codetalker7 commented 1 year ago

How about using value types: so instead of having type as a field in the ReplicateDesign constructor, we have it as a parameter:

# ReplicateType will be a symbol
struct ReplicateDesign{ReplicateType}
    # struct fields
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
end

So now, we can dispatch on the ReplicateDesign types. A very simple example:

myfunction(design::ReplicateDesign{:bootstrap}) = "This has type bootstrap"
myfunction(design::ReplicateDesign{:jackknife}) = "This has type jackknife"

This is just an idea; if this is the way to go, we'll have to implement it keeping in mind performance: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-value-type.

smishr commented 1 year ago

How about using value types: so instead of having type as a field in the ReplicateDesign constructor, we have it as a parameter:

# ReplicateType will be a symbol
struct ReplicateDesign{ReplicateType}
    # struct fields
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
end

So now, we can dispatch on the ReplicateDesign types. A very simple example:

myfunction(design::ReplicateDesign{:bootstrap}) = "This has type bootstrap"
myfunction(design::ReplicateDesign{:jackknife}) = "This has type jackknife"

This is just an idea; if this is the way to go, we'll have to implement it keeping in mind performance: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-value-type.

I think could be the going forward with multiple types of ReplicateDesign. @ayushpatnaikgit you mentioned this is how CRRao.jl does it. You think its the best way to go?

asinghvi17 commented 1 year ago

If only one line fundamentally differs, then it would make sense to do something like the following, for maximum configurability:

# ReplicateType will be some replicate type
struct ReplicateDesign{ReplicateType}
    # struct fields
    inference_method::ReplicateType
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
    ReplicateDesign(args..., inf::RepType) = ReplicateDesign{RepType}(args..., inf)
end

Base.@kwdef struct BootstrapReplicates
    iterations = 1000
end

Base.@kwdef struct JackknifeReplicates
...
end

then do some fancy dispatch on an outer constructor which creates the appropriate type. One of these replicate types would then go into replicate_design.inference_method, and presumably one could define some inference/confidence interval methods using these types, which e.g bydomain might call.

I personally wouldn't use symbols for this, because you can't pass any other metadata along. Types tend to be a little easier for extensibility as well.

smishr commented 1 year ago

If only one line fundamentally differs, then it would make sense to do something like the following, for maximum configurability:

# ReplicateType will be some replicate type
struct ReplicateDesign{ReplicateType}
    # struct fields
    inference_method::ReplicateType
    # constructor will look like the following
    ReplicateDesign{ReplicateType}(args...) = ...
    ReplicateDesign(args..., inf::RepType) = ReplicateDesign{RepType}(args..., inf)
end

Base.@kwdef struct BootstrapReplicates
    iterations = 1000
end

Base.@kwdef struct JackknifeReplicates
...
end

then do some fancy dispatch on an outer constructor which creates the appropriate type. One of these replicate types would then go into replicate_design.inference_method, and presumably one could define some inference/confidence interval methods using these types, which e.g bydomain might call.

I personally wouldn't use symbols for this, because you can't pass any other metadata along. Types tend to be a little easier for extensibility as well.

lets discuss this suggestion in our next meeting. @ayushpatnaikgit @codetalker7 fyi

ayushpatnaikgit commented 1 year ago

Guys, this is great. I did some experiments and there are no flaws with this. I don't think there is a need for a meeting. We should implement it.

We should implement Anshul's suggestion for the reason he mentioned.

We don't need an inner constructor right now. bootweights and jknweights can return ReplicateDesign objects.

smishr commented 1 year ago

okay. sounds good to me!

codetalker7 commented 1 year ago

Hi everyone, small question: now, since we're having inference_method as a property in every ReplicateDesign object, the following situation might arise: what if someone already has replicate weights (and we support creation of a ReplicateDesign object directly using these replicate weights), but they don't know what inference method was used to create these replicate weights. What should be done in that situation? Should we define an unknown inference type or something?

ayushpatnaikgit commented 1 year ago

Hi everyone, small question: now, since we're having inference_method as a property in every ReplicateDesign object, the following situation might arise: what if someone already has replicate weights (and we support creation of a ReplicateDesign object directly using these replicate weights), but they don't know what inference method was used to create these replicate weights. What should be done in that situation? Should we define an unknown inference type or something?

How about a default? BootstrapReplicates.