hongyuanmei / neural-hawkes-particle-smoothing

Source code of the neural Hawkes particle smoothing (ICML 2019)
BSD 3-Clause "New" or "Revised" License
42 stars 17 forks source link

Index out of bound Error in nhp.NeuralHawkes.getSampledStates #5

Open CharlottePan-98 opened 3 years ago

CharlottePan-98 commented 3 years ago

Dear Dr. Mei, I'm retraining a particle filtering model with the source codes and datasets you shared but something is wrong and this issue exists for all the datasets(pilotnhp0-0.5, pilotelevator, etc.). I guess there must be some tiny typos around here in the function of getSampledStates: all_cell_sampling = all_cell.view( batch_size * num_particles * T_plus_1, self.hidden_dim )[ index_of_hidden_sampling.view(-1), :].view( batch_size, num_particles, max_len_sampling, self.hidden_dim) It's said that there exists some indices given by the index_of_hidden_sampling.view(-1) are out of the bound of 0 dimension of all_cell.view(batch_size * num_particles * T_plus_1, self.hidden_dim ), like: IndexError: index 1962 is out of bounds for dimension 0 with size 1950 How to fix this error? Could you kindly give a solution to this bug? Best regards.

hongyuanmei commented 3 years ago

can u provide more details about how you use the code (e.g., env specs, args) and when the error message pops out?

CharlottePan-98 commented 3 years ago

Mac OS, edit with Pycharm. I just ran the code in CLI with python /nhps/functions/train_nhpf.py-ds pilotnhp0-0.5 All the other arguments I just left as default. I installed the module as editable so that I could debug. My Pytorch version is 1.8.0. I also tried the other datasets (data_mimic etc. As you shared for the NIPS 2017 paper) in order to train this nhpf model on complete data based on this new version of NHP for prediction tasks. To do so I generated the corresponding censor.conf files for the data_mimic dataset and set all missing probabilities to 0 in order to cope with the missing mechanism part in this version of code (Am I right to do this?) But the same bug popped up too.
The line info here may not be consistent with yours since I added some output codes. Traceback (most recent call last): File "./nhps/functions/train_nhpf.py", line 340, in <module> if __name__ == "__main__": main() File "./nhps/functions/train_nhpf.py", line 337, in main run_complete(dict_args) File "./nhps/functions/train_nhpf.py", line 162, in run_complete objective, _ = agent( batchdata_seqs, mode=1) File "---------------/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "---------------/neural-hawkes-particle-smoothing/nhps/models/nhp.py", line 329, in forward all_cell, all_cell_bar, all_gate_output, all_gate_decay File "---------------/neural-hawkes-particle-smoothing/nhps/models/nhp.py", line 198, in getSampledStates acs = all_cell.view(batch_size * num_particles * T_plus_1, self.hidden_dim )[index_of_hidden_sampling.view(-1), :] IndexError: index 1053 is out of bounds for dimension 0 with size 1050 This is the full error info when I try to output the tensor of all_cell.view(batch_size * num_particles * T_plus_1, self.hidden_dim )[index_of_hidden_sampling.view(-1), :] at nhp.NeuralHawkes.getSampledStates. Here, all_cell is with size(50,1,39,16). I also output the tensor index_of_hidden_sampling.view(-1) to see what happened. It turns out that the last entry contains indices that exceed size 50×1×39=1050: These are the last 3 entries in index_of_hidden_sampling: [[ 987, 989, 989, 989, 989, 989, 997, 997, 1000, 1000, 1007, 1013, 0, 0, 0, 0, 0, 0, 0, 0]], [[1010, 1010, 1013, 1015, 1015, 1018, 1018, 1023, 1023, 1031, 1031,1034, 0, 0, 0, 0, 0, 0, 0, 0]], [[1033, 1035, 1035, 1035, 1035, 1039, 1039, 1043, 1043, 1045, 1049,1053, 1053, 1053, 1055, 1057, 1057, 1059, 1065, 0]]]) As you can see the 1053~1065 are all invalid as indices. This must have something to do with how this index_of_hidden_sampling is generated in the function io.processors.sampleForIntegral(). But I don't know how to fix that.

hongyuanmei commented 3 years ago

can u first try replicating this problem with PyTorch 1.0? PyTorch has experienced a lot of updates over the years, and they now interpret many methods differently than they did. as for the code, training a nhp with complete data (i.e., calling phpf-related methods) has nothing to do with missingness mechanism, so you shouldn't worry about config files as for the invalid numbers, they are indeed odd, but let's dig into it after the problems are replicated in PyTorch 1.0, in which the code was originally developed. thanks.

to not mess up your develop environment in general, i recommend you create a new environment and only install your PyTorch 1.0 there.

qykong commented 3 years ago

I got the same error on the latest version of pytorch, but I confirmed the code works as expected with PyTorch 1.0

CharlottePan-98 commented 3 years ago

Hi, Dr Mei,

I found where the error came from. It was due to a tiny difference of the computation results of division operator “/" in the function processBatchParticles in processor.py. [cid:FD29E99A-7189-4ED2-B567-D9BCE247352B]

After I changed the division from ‘/‘ to ‘//‘ it works well with latest Pytorch. Probably it’s because the latest version has ‘/‘ returned float results but not integers.

Hope this will help.

Regards, Charlotte

2021年5月18日 上午8:29,Quyu Kong @.**@.>> 写道:

I got the same error on the latest version of pytorch, but I confirmed the code works as expected with PyTorch 1.0

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/HMEIatJHU/neural-hawkes-particle-smoothing/issues/5#issuecomment-842732368, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOWOARGAZFWRS4AXLNKZ5JDTOGYFRANCNFSM4YZ5IUDQ.

dayuyang1999 commented 3 years ago

Hi, Dr Mei, I found where the error came from. It was due to a tiny difference of the computation results of division operator “/" in the function processBatchParticles in processor.py. [cid:FD29E99A-7189-4ED2-B567-D9BCE247352B] After I changed the division from ‘/‘ to ‘//‘ it works well with latest Pytorch. Probably it’s because the latest version has ‘/‘ returned float results but not integers. Hope this will help. Regards, Charlotte 2021年5月18日 上午8:29,Quyu Kong @.**@.>> 写道: I got the same error on the latest version of pytorch, but I confirmed the code works as expected with PyTorch 1.0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOWOARGAZFWRS4AXLNKZ5JDTOGYFRANCNFSM4YZ5IUDQ.

Same error with torch 1.8.0.

Fixed after change the / in line 227, 232, 237 to //. according to Charlotte's anwser.

Thank you, Charlotte.

ChenglongMa commented 2 years ago

As suggested by torch 1.12.1:

UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

we can use torch.div(a, b, rounding_mode='trunc') instead.