tilltnet / egor

R Package for importing and analysing ego-centered-network data.
http://egor.tillt.net
GNU Affero General Public License v3.0
23 stars 4 forks source link

as_tibble.egor(), as_alters_df(), and as_aaties_df() should include design information when the ego has a design. #53

Closed krivit closed 4 years ago

krivit commented 4 years ago

At the moment, the design information is thrown away. Unfortunately, srvyr does not support joins and similar, but it does support indexing, and, also, as far as I can tell, the variables are stored separately from the design information, so it should, in principle, be possible to handle, say, alter table extraction, as follows:

# Create a weighted dataset.
e <- make_egor(8, 32) %>% 
  mutate(weights = sample(c(0.5, 1, 1.5), n(), replace = TRUE))
ego_design(e) <- list(weights = "weights")
ego_design(e)

# Obtain a mapping to "join" egos to alters:
eamap <- match(e$alter$.egoID, e$ego$variables$.egoID)
# Augment the ego survey design to have an ego for each alter (creating cluster samples):
ed <- e$ego[eamap]
# Replace the variables (previously those of ego) with those of the alters.
ed$variables <- e$alter
# This is now a valid `tbl_svy` object preserving the ego design but containing the appropriate alter variables:
ed

Any thoughts?

tilltnet commented 4 years ago

Yes, this would be a consistent way to handle egor objects with ego.designs, when converted to stand-alone representations.

I ran the code and it works fine. With the match() line in there it'll work also when .egoID in ego and alter object are not ordered in the same manner. Seems solid to me and should also work for aaties?!

A side note on print.tbl_svy(). For me it does not print the variables itself, but just the design info. I think I'd like it better if it would print the variables first as a tibble and then the design info. If I am not the only one bothered by this we could we file an issue at the srvyr repository?!

krivit commented 4 years ago

I am thinking of implementing essentially left_join and a full_join methods for tbl_svy using indexing, which we can then use in the backends. What do you think?

krivit commented 4 years ago

Actually, there's an even simpler solution: I had forgotten that we've had a workaround needed for *_join.egor and other dplyr verbs all along. I'll just use it here.

krivit commented 4 years ago

On even further thought, as_tibble.egor() should always return a tibble, and similarly with the *_df functions. But, as_survey.egor() should always return a tbl_svy, as should as_alters_survey(). This should happen regardless of whether the original egor object has an ego design.

In particular, a user might start out with an SRS of egos (and so not specify a design), but the alter list would then be a cluster sample with their egos being the clusters. If they want that information, they can use as_alters_survey().

krivit commented 4 years ago

OK, I've implemented these changes. It passes the checks, and I hope this is something that makes sense in the grander project. (If not, we can always revert.)