mshafieek / ADS-Missing-data-social-network

ADS master thesis
MIT License
0 stars 1 forks source link

remstats - cox data #6

Closed mshafieek closed 1 year ago

mshafieek commented 1 year ago

@vira-dvoriak I already check the remstats explanation and there they mentioned the following

Screenshot 2023-05-17 at 16 47 11

.... The remstats() function outputs a 3-dimensional array with statistics. On the rows of this array are the time points, the columns refer to the potential events in the riskset and the slices refer to the different statistics:

dim(out)

this is exactly what we need for the Cox model, coxph(), but there we need to convert it to a data frame. I try to find you an example soon. But before that, you could try it yourself.

vira-dvoriak commented 1 year ago

Thank you, I will try it.

vira-dvoriak commented 1 year ago

Do we need then 3 separate data frames 3882x240 for each statistic for the crox model?

mshafieek commented 1 year ago

@vira-dvoriak, no at all. You should create a data frame including the values of all statistics. Let's say one data frame with 5 columns: one for time, one for status (0 (non occurred events ) and 1 (occurred events) in risk set), and three columns for three statistics.

mshafieek commented 1 year ago

@vira-dvoriak, this is an example of the code:

library(survival) cox_t <- coxph(Surv(time, status) ~ x1 + x2 + x3, data=coxdata)

where xi, i=1,2,3 is the statistics of interest such as inertia, coxdata is the data frame that we already talked about that which includes all the statistics +time+status, please note that in this data we have the full risk set in each time point. This data can be extracted from the output of remstats. Please read the explanation of remstats precisely.

Hope this helps. Please let me know if you have any questions.

vira-dvoriak commented 1 year ago

When I calculate the three statistic (reciprocity, indegree sender and outdegree receiver) I get a "tomstats" "remstats" object with these dimensions: 3882 240 3. As far as I understand, 3882 corresponds to time points, 240 are the potential events in the risk set, and 3 are the different statistics. I am able to extract three slices individually which gives me three datasets with dimensions (3882 240), but as I understand, this is not what we are looking for.

It is difficult for me to imagine how the dataset that we need for the cox model has to look like and I don't understand how do we get the status column. It would be helpful to see an example of the dataset that can be used for the cox model.

I was following the tutorial on how to use remstats, but it is still not clear to me how to get the dataset with time, status, and a column for each statistic.

This is the code I get when I follow the remstats tutorial:

`load("UUsummerschool.Rdata") apollo.renamed <- PartOfApollo_13 apollo.renamed <- apollo.renamed %>% rename( actor1 = sender, actor2 = receiver )

effects <- ~ -1 + reciprocity(scaling = ("std")) + indegreeSender() + outdegreeReceiver() reh <- remify(edgelist = apollo.renamed, model = "tie") statsObject <- remstats(tie_effects = effects, reh = reh)

fit <- remstimate::remstimate(reh = reh, stats = statsObject, method = "MLE") summary(fit)`

I am able fit the model the way they do it in the tutorial, using remstimate, but I cannot figure out how to get the correct dataset for the cox model.

This is how I can extract the slices of different statistics, but this does not look like what you have described above.

reciprocity <- statsObject[,,1] indegreeSender <- statsObject[,,2] outdegreeReceiver <- statsObject[,,3]

mshafieek commented 1 year ago

@vira-dvoriak , I think what you did is true. Let's say we have the following edgelist containing 3 actors and three events.

a-->b t=0.5 day c-->a. t= 0.6 day a-->c t= 0.7 day

the data frame is as follows: s r. t a b 0.5 c a. 0.6 a c 0.7

now the cox data should be something as follow: s r. t status. ---- reciprocity in-degree out-degree a b 0.5 1 ---- ------- -------- ---------- a. c 0.5 0 ---- -------- -------- ---------- b. a 0.5 0 b. c 0.5 0 c. a 0.5 0 c. b 0.5 0 ---- -------- ---------- -----------

a b 0.6 0 a. c 0.6 0 b. a 0.6 0 b. c 0.6 0 c. a 0.6 1 c. b 0.6 0

a b 0.7 0 a. c 0.7 1 b. a 0.7 0 b. c 0.7 0 c. a 0.7 0 c. b 0.7 0

This is the cox data. A data frame containing 7 columns (including three statistics) which can be extracted from remstats. However, if you could use the remstimate and already it works and produces some parameters, that is fine. Can you use this for pooling the results? if not you should make the cox data and use cox model.

So you should convert that matrix for e.g reciprocity to a vector with dimensions (1, 3882x240) or the other way around as I mentioned above. Repeat this for all the statistics and make the cox data. Actually, if you remove the first two columns, (s,r), that is also fine.