pmixer / TiSASRec.pytorch

https://github.com/JiachengLi1995/TiSASRec in PyTorch
84 stars 15 forks source link

self.time_matrix_K_emb #2

Open youngzw opened 3 years ago

youngzw commented 3 years ago

https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L104 thx for your nice work about implementing 'TiSASRec' model with pytorch. here is a issue: 'self.item_num+1' may be written in 'args.time_span+1', although with the condition of item_num > time_span , it will have no effect on the result.

pmixer commented 3 years ago

https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L104

thx for your nice work about implementing 'TiSASRec' model with pytorch. here is a issue: 'self.item_num+1' may be written in 'args.time_span+1', although with the condition of item_num > time_span , it will have no effect on the result.

yes, you are right, thx for pointing out this typo!

youngzw commented 3 years ago

https://github.com/pmixer/TiSASRec.pytorch/blob/059c6a6193ad6b93ac318f43ddbdcf5d428c9d42/model.py#L66

hi, pximer:

Referring to your full implementation of transformer encoder, i got a RuntimeError in [ model.py:66 ] when i tried to modify args.num_heads==1 to args.num_heads==4.

I modify the code with time_mask = time_mask.unsqueeze(-1).repeat(self.head_num, 1, attn_weights.shape[-1]), then the code works well.

here is the difference between torch.Tensor.expand() and torch.Tensor.repeat().

pmixer commented 3 years ago

@youngzw THANK YOU again for pointing out the bug! Yeah, I used the expand instead of repeat just to save gpu memory which may be a not so good habit 😿, it has been fixed with the most recent commit. BTW, just out of curiousity, what hidden size you used for num_heads=4? As default hidden size=50 does not support num_heads=4 actually.

youngzw commented 3 years ago

yeah, i find hidden size=50 does not support num_heads=4, so i adjust hidden_size = 64. for this problem, adding a FC layer behind the multihead layer might be one way, but i haven't tried that, or this way may affect the exp result :)

pmixer commented 3 years ago

never mind, no need of FC layer, I just want to confirm that you changed hidden size, pls just set num heads as divisor of default hidden size or change hidden size to multiple of num heads, as that’s a natural requirement for multi head attention, I will add a checker in the code later to remind users about this issue.

Regards, Zan Huang


发件人: Zhongwei Yang notifications@github.com 发送时间: Saturday, November 21, 2020 10:37:58 PM 收件人: pmixer/TiSASRec.pytorch TiSASRec.pytorch@noreply.github.com 抄送: 黄(Huáng)瓒(Zàn) dreaming_hz@hotmail.com; Comment comment@noreply.github.com 主题: Re: [pmixer/TiSASRec.pytorch] self.time_matrix_K_emb (#2)

yeah, i find hidden size=50 does not support num_heads=4, so i adjust hidden_size = 64. for this problem, adding a FC layer behind the multihead layer might be one way, but i haven't tried that, or this way may affect the exp result :)

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pmixer/TiSASRec.pytorch/issues/2#issuecomment-731693354, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADFOOX5DRJFDLWXPIW6LRRDSRCBRNANCNFSM4TLJKKXA.

youngzw commented 3 years ago

well-done job!!!

xiqxin1 commented 2 years ago

Hi friends, could you tell me how long it takes for one epoch?

pmixer commented 2 years ago

depends on your dataset size, if just for default ml1m dataset, expect few minutes, even just w/ cpu.

xiqxin1 commented 2 years ago

Thanks! maybe I have a bug. I try to replace your time interval data with distance interval, but I've waited for half-hour. Could you tell me a bit more about how you calculate the time interval when the items are padded in the sequence? (e.g., the maxlen=50, where the actual sequence len=40, the #padded items=10)

pmixer commented 2 years ago

Thanks! maybe I have a bug. I try to replace your time interval data with distance interval, but I've waited for half-hour. Could you tell me a bit more about how you calculate the time interval when the items are padded in the sequence? (e.g., the maxlen=50, where the actual sequence len=40, the #padded items=10)

@xiqxin1 it's bit long time ago, if I remember it in the right way, time interval should be time span between two items interaction timestamp https://github.com/pmixer/TiSASRec.pytorch/blob/5ab23975fe695cdba1fbc7b466710652b9fac9b1/utils.py#L15 pls consider opening a new issue if still having problem, btw, TiSASRec paper should provide more info for your reference if hope to modifiy the model based on former work.

Hang-Z commented 1 year ago

Hi, pmixer: I want to test the model and see the performance, so I just run the command

python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200

as your README. But I get this error:

size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).

I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?

Hang-Z commented 1 year ago

Hi, pmixer: I want to test the model and see the performance, so I just run the command

python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200

as your README. But I get this error:

size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).

I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?

Never mind, I just train a new model, and the testing performance is even a bit better than your paper. Thanks for your solid work~

pmixer commented 1 year ago

Hi, pmixer: I want to test the model and see the performance, so I just run the command

python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200

as your README. But I get this error:

size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).

I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?

Never mind, I just train a new model, and the testing performance is even a bit better than your paper. Thanks for your solid work~

sounds good, 3417 sounds like user/item num. btw, I'm not the paper author, just ported the code from tf to pytorch.