xKDR / Survey.jl

Analysis of complex surveys
https://xkdr.github.io/Survey.jl/
GNU General Public License v3.0
53 stars 19 forks source link

New constructor for `ReplicateDesign`. #251

Closed codetalker7 closed 1 year ago

codetalker7 commented 1 year ago

In this PR, we add a new constructor for the ReplicateDesign struct which allows the user to directly construct a ReplicateDesign without having to first construct a SurveyDesign.

We've also added a default constructor for ReplicateDesign which just builds the object out of the given arguments.

Finally, a new property called replicate_weights is added to the ReplicateDesign object; for now, this will be a Vector{Symbol}. Later we can add more iterators.

For completeness, here is an example for how the constructor works now.

using Survey, DataFrames, CSV
apistrat = load_data("apistrat")
dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw)
bootstrat = bootweights(dstrat; replicates=10)

apistrat_fromcsv = CSV.read("input.csv", DataFrame)
bootstrat_direct = ReplicateDesign(apistrat_fromcsv; strata=:stype, weights=:pw, replicates=UInt(10), replicate_weights=[Symbol("replicate_"*string(replicate)) for replicate in 1:10])

# getting mean, answers are identical
mean(:api00, bootstrat)
mean(:api00, bootstrat_direct) 

# getting total, answers are identical
total(:api00, bootstrat)
total(:api00, bootstrat_direct) 

If this is okay, I will add tests, documentation and some other final touches to the PR.

Closes #194

codetalker7 commented 1 year ago

@ayushpatnaikgit @smishr one question: should we enforce the convention that the replicate weight columns should be called "replicate_i"? If so, some minor changes to the code will have to be made.

smishr commented 1 year ago

i think this would be great, since downstream functions assume the "replicates_". Also we dont have to create extra functiona.

codecov-commenter commented 1 year ago

Codecov Report

Merging #251 (327dd14) into v0.1.1 (19d6971) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            v0.1.1      #251   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           12        12           
  Lines          183       196   +13     
=========================================
+ Hits           183       196   +13     
Impacted Files Coverage Δ
src/bootstrap.jl 100.00% <ø> (ø)
src/SurveyDesign.jl 100.00% <100.00%> (ø)
src/show.jl 100.00% <100.00%> (ø)
src/by.jl 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codetalker7 commented 1 year ago

@smishr @ayushpatnaikgit small note: I think we should move the documentation of the constructors to the docstrings of the respective constructors (right now the documentation is in the docstring of the SurveyDesign and ReplicateDesign structs). I can do that in this PR.

smishr commented 1 year ago

@smishr @ayushpatnaikgit small note: I think we should move the documentation of the constructors to the docstrings of the respective constructors (right now the documentation is in the docstring of the SurveyDesign and ReplicateDesign structs). I can do that in this PR.

that sounds good

smishr commented 1 year ago

@codetalker7 i changed base from main to v0.1.1. @ayushpatnaikgit have a look and merge if u think all good. Im fine with calling the base SurveyDesign constructor being common to both.

codetalker7 commented 1 year ago

Fixed minor error in documentation. It'll be helpful if we enable strict doctesting to detect these.

ayushpatnaikgit commented 1 year ago

1) It seems that in R, you can pass a regular expression, https://www.rdocumentation.org/packages/survey/versions/4.1-1/topics/svrepdesign It seems like a good idea. 2) We should have an option to say replicate weights columns are from colnumber x to colnumber y. I think this is also done in R, but I can't find the example.

ayushpatnaikgit commented 1 year ago

i think this would be great, since downstream functions assume the "replicates_". Also we dont have to create extra functiona.

Column names should be changed to “replicate_” if they aren't in this format.

smishr commented 1 year ago
  1. It seems that in R, you can pass a regular expression, https://www.rdocumentation.org/packages/survey/versions/4.1-1/topics/svrepdesign It seems like a good idea.

If this is too time consuming add as issue to be done later.

  1. We should have an option to say replicate weights columns are from colnumber x to colnumber y. I think this is also done in R, but I can't find the example.

This should be relatively easy to add i think.

Column names should be changed to “replicate_” if they aren't in this format.

I think this is the most important out of all three suggestions to be implemented first

codetalker7 commented 1 year ago
  1. It seems that in R, you can pass a regular expression, https://www.rdocumentation.org/packages/survey/versions/4.1-1/topics/svrepdesign It seems like a good idea.

    1. We should have an option to say replicate weights columns are from colnumber x to colnumber y. I think this is also done in R, but I can't find the example.

Added all these suggestions, along with tests and documentation.