selectionMethod and joint inclusion probabilities

edvinf commented 3 years ago

The algorithm for calculation of joint inclusion probabilities are different for different probabilistic sampling schemes that are not differentiated by the current codes for selectionMethod. Pairwise joint inclusion probabilities are needed for variance estimation with Horvitz-Thompson estimators.

We have previously discussed a related issue when considering if inclusion probabilities could be calculated from selection probabilities or vice versa. We then solved it by including both sampling probabilities, but this is impractical for pairwise joint inclusion probabilities which requires a vector for each sampling unit (or a matrix for a set of sampling units).

Two sampling strategies in use are identified, which have different algorithms for calculation of joint inclusion probabilities:

Selection of a fixed number of elements from a list.
Poisson sampling: "Roll a dice" for each sampling unit to determine selection, sample size is not fixed.

A possible solution would be to tighten the definition for the probabilistic selection methods so that they are restricted to case 1., and then add corresponding codes that are restricted to case 2.

This could also be considered in the intersessional work WGCATCH, that are currently reviewing non-probabilistic selection methods.

nmprista commented 3 years ago

@edvinf A somewhat similar issue was raised at WKRDB-EST2

In simpler cases inclusion probabilities needed for variance estimation can be calculated from sample size and population size. However, more complex joint inclusion probabilities are required for estimation of variance for some design, e.g., unequal probability designs. These are not currently incorporated into the RDBES format. They take the form of matrices of joint inclusion probabilities for units within a sample and so are not easy to incorporate in the model.

The WKRDB-EST2 discussion went along the lines of proposing that

such complex joint inclusion probabilities are not, for now, incorporated into the RDBES data model. Rather, institutes requiring these more complicated analyses should be suggested to import them into R for the estimation in a separate format, or use other imported information to calculate them, if they are required.

The above is something you can consider in the intersessional work on selection method.

nmprista commented 3 years ago

Analysis done at WKRDB-EST2 indicates documentation to be added

WKRDBEST2_Annex5_Issue3_VarianceUPSWOR.docx

lizclarke commented 3 years ago

We have decided to pass this to the selection method subgroup. It would be ideal but not totally necessary to resolve this before the data call. (In general joint inclusion probabilities will need to be dealt with separately in the estimation stage.)

ices-tools-dev / RDBES

selectionMethod and joint inclusion probabilities #76