havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
831 stars 193 forks source link

Using LSTM instead of MLPVanilla #53

Closed hgjlee closed 4 years ago

hgjlee commented 4 years ago

In order to use LSTM instead of MLPVanilla with the CoxTime and CoxPH models, I have the following model class. It works mechanically, but I want to make sure that the implementation is theoretically correct. I'm trying to make each patient the input sequence for the LSTM model and the hidden and cell states can be transferred within that sequence, not on the whole batch of patients as a sequence. Would you be able to share some insights?

from torch import nn

class LSTMCox(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, n_layers, output_size):
      super(LSTMCox, self).__init__()
      self.n_layers = n_layers
      self.hidden_dim = hidden_dim
      self.embedding_dim = embedding_dim

      self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers)
      self.fc = nn.Linear(hidden_dim, output_size)
      self.activation = nn.ReLU()

    def forward(self, input):
      input = input.view(len(input), 1, self.embedding_dim)

      lstm_out, _ = self.lstm(input)
      lstm_out = lstm_out.contiguous().view(len(input), -1)

      out = self.fc(lstm_out)
      out = self.activation(out)

      return out

net = LSTMCox(in_features, 512, 1, 1)
model = CoxPH(net, tt.optim.Adam)
model.optimizer.set_lr(0.01)
log = model.fit(x_train, y_train, batch_size, epochs, callbacks, val_data=val, val_batch_size=batch_size)
havakv commented 4 years ago

Hi @hgjlee! It's very interesting that you're working on Cox regression with LSTMs! But I'm not sure I fully understand what your objective here is.

If I understand correctly, you have a regular two-dimensional x_train with each row representing an individual and each column representing a covariate/variable/feature. And your LSTM then makes predictions for each individual using a latent state that "encodes" information about the previous individuals in the batch (the LSTM iterates over individuals)? If that is the case, how do you decide the ordering of the individuals (rows of x_train)? Keep in mind that I might just have misunderstood what you're doing.

hgjlee commented 4 years ago

Thank you for your reply! That's pretty close. I'm trying to make a sequence for each individual and have LSTM run on the sequences separately. And the states should exchange within each individual sequence rather than between the individuals. I hope this is a clearer explanation.

havakv commented 4 years ago

Ah, I understand! In that case I agree with the approach. It makes total sense to let an LSTM iterate over the features for each individual. I'm assuming you features are some sort of time-series?

However, I still can't wrap my head around that this actually happens (it's been a while since I last worked with RNNs). According to the pytorch docs the input to an LSTM should be of shape (seq_len, batch, input_size) but your input is defined as input = input.view(len(input), 1, self.embedding_dim). Doesn't this mean the sequence your LSTM runs on is the rows of x_train (which I assume represent each individual)? Or does each column of x_train represent a sequence of variables for an individual? Could you give an example of x_train so it would be simpler to understand this?

If x_train is two-dimensional and you want the LSTM to run through the features of each individual, doesn't that mean your embedding_dim should be 1? And then your input should have the shape (embedding_dim, len(input), 1)?

hgjlee commented 4 years ago

Yes, you're right. I'm trying to introduce time with this approach.

And that's exactly what I'm trying to make sure of right now. So let's say that the embedding size is 3 and the sequence length is 2. I'd have a list of lists of tuples as such: [[(1,1,1), (2,2,2)]]. Each index would represent an individual.

In the above case, I'm thinking this instead: input = input.view(2, 1, 3) since an individual has a seq length 2, the batch size is 1, and the embedding size 3.

havakv commented 4 years ago

To make sure I'm not misunderstanding, in [[(1,1,1), (2,2,2)]] do you have 2 or 1 individual? If you have 1 individual, I agree with you.

hgjlee commented 4 years ago

that would be one individual with two features of embedding size 3. Great. Thanks for sharing your thoughts! That was helpful.

havakv commented 4 years ago

Great! Looks like you have everything under control! Hope you'll get the opportunity to share your results with us at some point in the future!