Open ik362 opened 3 years ago
Are your patients switching in and out of the control group? If not, patient
as group_id
should suffice. The group_ids
identify a time series. the time_idx
identify each datapoint in a given time series. Not sure what subj_1
are and how it is different from the patient
.
@jdb78 thanks for your help!
My ultimate question is along the lines of: can the time series in the control group be better predicted than the time series of the patients?
Are your patients switching in and out of the control group?
In this sense the patient/control labels are static. In a statistical sense, they are independent samples not repeated measures.
Not sure what subj_1 are and how it is different from the patient
At the moment I have coded the data similar to this:
So at the moment the subj
column is only used as a dummy variable to identify which time_idx
corresponds to which subj
. I guess my question is: do I need to "double group" time series into subject-level and group-level? Or is the subject-level implied by the time_idx
?
You can use both group levels group
and subj
but subj
alone will do the job. You might want to consider using group
as a static categorical variable on top.
Does this help?
Hi Jan,
Thanks for getting back to me! I have set up my TimeSeriesDataSet like this:
training = TimeSeriesDataSet(
df,
group_ids = ['subj'],
target = 'source2,
time_idx = 'time_idx',
max_encoder_length = 20,
max_prediction_length = 20,
time_varying_known_reals = ['time_idx'],
time_varying_unknown_reals = ['source1', 'source2', 'source3', 'source4', 'source5',],
'static_categorical = ['group']')
Is this set up correct?
Also, I wanted to ask about how to best compare groups? Would It make sense to use:
predictions, x = best_tft.predict(val_dataloader, return_x=True)
predictions_vs_actuals = best_tft.calculate_prediction_actual_by_variable(x, predictions)
And calculate which group had smaller differences between actuals and predictions?
Thanks for your help!
Hi Jan,
Just to add a little more to my previous post: after running tft and generating predictions I get this figure.
Does it make sense to calculate the difference between actual and prediction for each subject and then do something like a mann-whitney U test to find group-level differences?
Also, as a sub-question: is there a reason why some subj dont have a prediction?
Thanks, Isaac
Maybe not all subjects are in the validation set? I wonder if you want to include a variable for distinguish the two groups.
Hi Jan,
Thanks for getting back to me:
Maybe not all subjects are in the validation set?
I used the standard procedure (from the tutorials) to define the data sets with the code:
validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
batch_size = 16
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=28)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=28)
I wonder if you want to include a variable for distinguish the two groups.
Do you mean to set group_ids = ['subj', 'group']
?
Thanks, Isaac
Hi there,
I had a similar question to #490 regarding how to code the group_ids for my analysis.
I am analysing time series in the context of medical data i.e. I have many time series rather than one long time series.
My time series come from two cohorts (patients and controls) and in my dataframe I have:
I wanted to know if the 'subj_id' column should be part of the 'group_id' parameter or another parameter?
Thanks!