Some questions about evaluation.

HYTYH commented 2 years ago

Hi author, I congrats and appreciate your work! But I have some questions about the evaluation of this model.

I noticed that the data format is:

{'target_l': [54, 31], 'target_th': [3, 10], 'loc': [[27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 29, 31, 42, 43, 44, 38, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54]] ......

where cur_loc (loc[1]) has overlapped the target_l in location 54, when I read your codes I found that target_l 54 is used for calculating the recall. Could you please tell me if I am right about this? Is it a bug or did I miss something? I hope to get your reply, thanks!

HYTYH commented 2 years ago

Hi, I add these lines in utils.py:

for batch_idx in batch_idx_list:
    uid, sid = data_queue[batch_idx]
    uid_batch.append([uid])

    # add here:
    data_input[uid][sid]['loc'][0].extend(data_input[uid][sid]['loc'][1][:])
    data_input[uid][sid]['tim'][0].extend(data_input[uid][sid]['tim'][1][:])
    data_input[uid][sid]['cat'][0].extend(data_input[uid][sid]['cat'][1][:])
    data_input[uid][sid]['loc'] = [data_input[uid][sid]['loc'][0]]
    data_input[uid][sid]['tim'] = [data_input[uid][sid]['tim'][0]]
    data_input[uid][sid]['cat'] = [data_input[uid][sid]['cat'][0]]
    data_input[uid][sid]['target_l'] = [data_input[uid][sid]['target_l'][-1]]
    data_input[uid][sid]['target_c'] = [data_input[uid][sid]['target_c'][-1]]
    data_input[uid][sid]['target_th'] = [data_input[uid][sid]['target_th'][-1]]
...

which should process the data

{'target_l': [54, 31], 'target_th': [3, 10], 'loc': [[27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 29, 31, 42, 43, 44, 38, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54]] ......}

to something like:

{'target_l': [31], 'target_th': [10], 'loc': [[27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 29, 31, 42, 43, 44, 38, 45, 46, 47, 48, 49, 50, 51, 52, 53], [54]] ......}

which should address the label leakage problem by only calculating the last target location's recall and the last category's recall. (Same as the problem definition)

Then I try to rerun the experiments and got the following results:		NYC			NYC			TKY			TKY
	category			location			category			location
R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10
0.293	0.502	0.575	0.231	0.391	0.428	0.424	0.653	0.720	0.204	0.350	0.403

The results have dropped a lot compared to the results reported in the paper. If I have misunderstood your operation, I hope you can help me understand.

If the bug I mentioned above does exist, do you think it is a reasonable way to modify your code as the code block displayed above? I look forward to your reply!

herozen97 commented 2 years ago

Hi, Thanks for your contact! This is not a bug. Actually, you have some misunderstandings about our settings.
Assuming data = {'target_l': [54, 31], 'target_th': [3, 10], 'loc': [[27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 29, 31, 42, 43, 44, 38, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54]] ......}, 1) data['loc'][0] is the historical session and data['loc'][1] is the current session. We divide the trajectory into sessions with "week" strategy. This means you cannot add 53 to data['loc][0], because 53 belongs to the new week (the current session). Moreover, in the data samples that we don't see, there are also cases where the lengths of their data['loc'][1] are larger than 1 or 2. Your processing operations reduce a large number of test samples. 2) The actual process, or the data stream, in our code is: when we predict 54, the model only knows the historical session and 53; when we predict 31, the model knows the historical session and [53, 54]. This form of data organization is very common in our baselines, such as DeepMove and LSTPM. Thanks again.

HYTYH commented 2 years ago

Hi author, thanks for your timely reply!

I understand the meaning of a session and thanks for the explanation. May I know which part of your codes masked out location 54 in data['loc'][1] when the model predicted location 54 in data['target_l']? I understand three masks (target_mask, his_mask, cur_mask) are used to mask those paddings.

I see the progress that generate_batch_data(.) in utils.py input all locations in current sessions (e.g., [53, 54]) into the model for the forward propagation to generate outputs (th_pred, c_pred, l_pred) that match the target data format.

So I still don't understand which part of your codes ensures the second claim (The actual process) of your response.

herozen97 commented 2 years ago

Hi, Maybe this picture can help you understand. I think this is an advantage of the RNN series, compared to the Transformer (attention series) which requires an explicitly defined mask.

HYTYH commented 2 years ago

Thanks for your time author, I think I see what you mean here...

But I think that when passing hidden states like hc_t and hc_l to capturer_c and capturer_l will still cause the label leakage problem since they are the final hidden states of the previous GRU. I'm confused... Am I right about this?

HYTYH commented 2 years ago

These codes in model.py:

cur_t_rnn, hc_t = self.capturer_t(rnn_input_his_concat, rnn_input_cur_concat, his_mask, cur_mask, mask_batch[1:])
        if self.cat_contained:
            cur_c_rnn, hc_l = self.capturer_c(rnn_input_his_concat, rnn_input_cur_concat, his_mask, cur_mask, mask_batch[1:], hc_t) 
            cur_l_rnn, _ = self.capturer_l(rnn_input_his_concat, rnn_input_cur_concat, his_mask, cur_mask, mask_batch[1:], hc_l)

indicates that cur_c_rnn and cur_l_rnn are containing ground-truth features introduced by hc_t. If I didn't misunderstand the reason you explain RNN structure to me, I can tell cur_t_rnn is unbiased and without the ground-truth leakage problem. However, the main results reported in the paper are recall of category and location, I believe the final hidden states (which correspond to hc_t and hc_l) actually have learned ground-truth information which means that the predictions cur_c_rnn and cur_l_rnn are biased.

I'm just trying to understand your work. I just hope I won't be misled as well as make mistakes in my own work in the future, again I think the contribution of your paper is not so much about getting much prediction accuracy but using human logic to help make next-hop predictions. So I hope to get your proper help to understand this work better together, and I hope you can continue to help me and not misunderstand my intention to open this issue, thanks again!

herozen97 commented 2 years ago

I see what you mean. We will check the code and update this repository as soon as possible. Thank you for your issue.

herozen97 commented 2 years ago

We have revised the code and the new experiment is underway. It also requires new parameter searching. Let's expect promising results. We will update the new results as soon as possible. Thanks again.

PS: the newest results for location prediction are R@1=0.25 on NYC at Epoch 9 and R@1=0.23 on TKY at Epoch 7, when I write this comment. These results are still competitive with the baselines. (These outperform all of the baselines except LSTPM, whose settings require filtering more sparse users.)

HYTYH commented 2 years ago

Hi author,

It seems that your newest results to some extent consistent with my rerun experiments' results above, while your newest results compute recall for more samples. I think numerical results are not that important actually, your causality contribution and this small bug leave some room for future work.

Thank you for your timely revisions and all your replies. Take care of your rest (It looks like you were up late last night revising..), research is not all there is to life. ==

urbanmobility / CSLSL

Some questions about evaluation. #1