mshafieek / ADS-Missing-data-social-network

ADS master thesis
MIT License
0 stars 1 forks source link

Preparing data for coxph #9

Closed JesseJvdw closed 1 year ago

JesseJvdw commented 1 year ago

I'm trying to create a data frame for the coxph model. But I am having problems creating the status column. If I understand correctly the status = 1 when in time 1 the pair of sender receiver is the same as in the risk set. Now I'm trying to make the column using ifelse() but when I run it, it only gives 1 time point a status of 1.

I've updated the rmd in my file, the code I am talking about is all the way at the bottom.

mshafieek commented 1 year ago

@JesseJvdw, no this is not true. Status is 1 for t1 (for one event) in the risk set when that event is the same as the event in edgelist in t1. So you should repeat this for t2 etc. That is for each time point we have the risk set and you repeat this for each time point. Look at my example in "remstats - cox data".

mshafieek commented 1 year ago

@JesseJvdw, An alternative approach could involve generating two matrices. The first matrix would encompass the risk sets for all time points, while the second matrix would have the same dimensions but would replicate the actual events occurring at their respective time points. By subtracting these two matrices, the resulting matrix would contain (0,0) values in rows where a 1 should be inserted in the status vector. Is that clear? then you do not need for loop.

JesseJvdw commented 1 year ago

@mshafieek To get the correct length for the second matrix I need to multiply each row(each time point in the edgelist) by the amount of risk sets, no?

Edit: Somehow when I convert the edgelist (eacht time point multiplied by 240) to a matrix it changes the amount of actors from 19 to 16. I just do data.matrix(df_multiplied) and I checked the new dataframe and there is has all the correct values.

Edit2:

Fixed it

mshafieek commented 1 year ago

@JesseJvdw, true.