philipperemy / cond_rnn

Conditional RNNs for Tensorflow / Keras.
MIT License
225 stars 32 forks source link

what does the static data shape? #44

Open somi74 opened 1 year ago

somi74 commented 1 year ago

I have a dataset about glucose for 200 patients, and I have some static data that doesn't relate to time. These static data are from a case form that every patient answers, it's about 40 columns and the rows are 200, because of the patients. and I have roughly 3000 rows for glucose for each patient. I want to predict glucose 30 minutes later. What should I do? How can I use this library for my work?"

somi74 commented 1 year ago

@philipperemy please answer my question.

philipperemy commented 1 year ago

Hey @somi74 , ConditionalRecurrent takes two matrices as input:

Define your model like that.

model = Sequential(layers=[
        ConditionalRecurrent(GRU(128)),
        # [...]
])

Use your data here.

x = np.random.uniform(size=(200, 3000, 1))
c = np.random.uniform(size=(200, 40))
y = np.random.uniform(size=(200, 3000, 1)) # for training, predict the next step. lots of info online how to do it.

Predict with your model

model.predict([x, c])
somi74 commented 1 year ago

Thanks @philipperemy, but I have collected all the data according to their PtID in a dictionary. Here's how it is structured:

image

The first array, X, depends on time (glucose) with a sliding window of 6 (due to a prediction horizon of 30 minutes). The second array, y, represents the target. Lastly, there are other parameters that do not depend on time. I have extracted each of these lists to feed them into the model. The shapes of the arrays are as follows: X_train shape: (436165, 6,1) y_train shape: (436165,) condition shape : (188, 27) {I drop other columns which don’t relate to glucose then my columns reduce to 27, and 12rows have null value. I dropped it too.} However, when I try to use this library, I encounter a ValueError with the following message: model = Sequential(layers=[ ConditionalRecurrent(GRU(128)), Dense(units=1, activation='linear')] )

model.compile(optimizer='adam', loss='mae')

fit a model;

history = model.fit(x=[X_train, condition], y=y_train, epochs=10, batch_size=None, shuffle=True, validation_data=(X_val, y_val))


ValueError: Data cardinality is ambiguous: x sizes: 436165, 188 y sizes: 436165 Make sure all arrays contain the same number of samples.

I'm unsure how to fix this error.

xiaozegu commented 1 year ago
size=(200, 3000, 1)

This size=(200, 3000, 1) of y means that the last linear layer has 3000 cells. Does it cause the model become too large?

xiaozegu commented 1 year ago

(436165, 6,1)

You condition shape should be (436165, 188, 27)