using sampling design data files (SDDFs)

BernStZi commented 5 years ago

@BernStZi, thanks a lot for this explanation! However I do not fully agree with you regarding the design weights. What I have learned and I have always assumed is that designs weights are derived directly from the sampling probabilities. Namely, dweight should be equal to 1 / prob. The name of those weights indicates that those weights are purely derived from the sample design.

I agree that design weights can not be applied in case of non-response. Well, you can, but better results can be gained by applying the so called non-response corrections on weights. This is a usual practice. However, those corrected weights cannot be called design weights, as they are derived taking into account extra information which is more than sampling design is providing.

Originally posted by @djhurio in https://github.com/ropensci/essurvey/issues/9#issuecomment-502478084

BernStZi commented 5 years ago

@djhurio you are absolutely correct, the design weights are simply 1 / prob. But I remember that for some of the earlier rounds the prob variable published with the SDDFs was apparently not always the one used to compute the design weights. From personal experience I know that getting the prob variable right often involved some back and forth between sampling experts and the national coordinators. And I cannot say with certainty if, for those early rounds, the prob variable used to calculate the design weights also made it into the SDDFs. You have to member that the idea of the SDDFs was new back then and routines were probably now so well refined as they are now. This might also true for other variables in the SDDF. For instance difficulties to distinguished between explicit and implicit stratification (i.e. ordered systematic sampling) might have led to countries not reporting the correct values for the stratify variable.

In any case, I have more trust in the design weights that are included into the data sets, then using 1 / prob from the SDDFs. Especially since the weights in the data sets have been tested repeatedly and with scrutiny by Vasja and this team when they developed the first post-stratification weights.

The post-stratification weights are a different matter, but nonetheless they are based on the design weights found in the data set, not the SDDFs.

BernStZi commented 5 years ago

@briatte

I think we can suggest that to the ESS team when we email with the list of things that we want to tell them, based on our discussion (weird differences between design weights and SDDF files in Israel Round 1, France Round 1 and UK Rounds 2-3, missing prob in UK Round 4).

Yes I would be more than happy to share the summaries on key variables of the sampling designs. Also, the SDDFs that are requested by the ESS from the countries contain much more (useful) information, especially for variance estimation, then the versions that are published on the website. Maybe there is a way to make use of this information.

ropensci / essurvey

using sampling design data files (SDDFs) #40