Open wywhu opened 4 years ago
Hi wywhu,
If you look at the code, masks are created during the training phase, so the mismatch between window_size and actual sequence length shouldn't be a problem. However, I wrote this code 4 years ago, so this is just speculation.
There is no fixed answer as to what number of epochs works best, as your dataset is different from what I had used. You can try to separate the cost into visit_cost and emb_cost (see line 133 of the source code), see how they behave, then select the epoch you like. This of course involves some coding.
Hope this helps, Ed
Thanks Ed. I have another question about interpreting the code representations.
In your paper, it says that "we trained ReLU(W_c), a non-negative matrix, to represent the meaning of .......", and "we can find the top k code that have the largest values for the i-th coordinates by argsort(W_c[i, :])[1, k]".
I am confused, should I look at W_c or ReLU(W_c) in the argsort operation?
Actually, you are correct. You should look at ReLU(W_c) in the argsort operation, which guarantees non-negativity. However, since all medical codes are trained in the non-negative space, I don't think the results would be too different. But technically you should use ReLU(W_c). Thanks for pointing it out!
Hi Ed,
I am training embedding using your default hyperparameters, except window_size. The minimum number of visits in my dataset is 2, but I set window_size=3 as I suppose your code can handle the inconsistency between window_size and actual sequence length. Am I right?
I also noticed that the mean_cost was the minimum at the 2nd epoch then it started increasing. Although I read in your paper that the number of epochs does not hurt the code representations very much, I am not sure which epoch should I choose after finished training. Should I used the minimum cost one, or the one from the last epoch?