tilltnet / egor

R Package for importing and analysing ego-centered-network data.
http://egor.tillt.net
GNU Affero General Public License v3.0
23 stars 4 forks source link

Adding alter totals to the egor object in fixed choice designs #71

Open martinamorris opened 3 years ago

martinamorris commented 3 years ago

In fixed choice designs, we may not have the full census of alters observed. Often in these designs we collect ancillary info on the total number of alters, in addition to the alter data for the fixed number subset. That total can be used to impute/simulate a scaled up alter set, or to modify an ergm fit to the alter subset, if we assume that the unobserved alters are missing at random.

For the ergm fit, it should be possible to use this alter total to adjust the edges term to produce the correct overall totals.

It would be helpful to have a component in the egor object list that can be used to store these totals, as it is part of the design.

If you already have this capability, just lmk where to find the info ;)

related to: https://github.com/statnet/ergm.ego-private/issues/46

krivit commented 3 years ago

Hard to say; in some sense, the most statistically correct way to represent this sampling design is by attaching a survey design to the alter table: a cluster sample with known cluster sizes (and some cluster sizes being 0).

martinamorris commented 3 years ago

Right, and then the question is how to weight the elements of the clusters. We had this discussion earlier this year for another project, so I'm going to C&P my summary of the issue here.

My summary:

1. you're using a fixed choice design survey.  if it also collected the total number of partners that's call an
"augmented fixed choice design".

2. if the tot # partners =< fixed choice limit, the edgewts = egowts

3. if the tot # partners > fixed choice limit, edgewts = egowts * tot/limit

4. if you don't have the total, then you'll need to see if you can identify evidence of bias in what you do have.  couple
of options for sensitivity analysis:

a. use only the most recent partner, but then your inference is to a different population measure (the most recent
partner, rather than all partnerships).  if there's a big difference, it could indicate that more active persons have a
different pattern of partnerships.

b. compare estimate based on all edges for R's with =< fixed choice limit, to estimates based on all edges for R's with >
fixed choice limit.  ditto interpretation

Also need to consider:

Is the goal to estimate the fraction of edges that have a certain characteristic (e.g., 20% of ties are homophilous)?  
Or the average value of that edge characteristic for egos (e.g. on average, X% of egos' ties are homophilous)?