sample_neg_items report keyError

zihaozhu93 commented 9 months ago

hi,there

I've tried follow pinSage example,

pinsage_dgl = PinSageDGL(
        "ranking",
        data_info,
        loss_type="max_margin",
        paradigm="i2i",
        embed_size=16,
        n_epochs=2,
        lr=3e-4,
        lr_decay=False,
        reg=None,
        batch_size=2048,
        num_neg=3,
        dropout_rate=0.0,
        remove_edges=False,
        num_layers=2,
        num_neighbors=3,
        num_walks=10,
        neighbor_walk_len=2,
        sample_walk_len=5,
        termination_prob=0.5,
        margin=1.0,
        sampler="random",
        start_node="random",
        focus_start=False,
        seed=42,
    )
    pinsage_dgl.fit(
        train_data,
        neg_sampling=True,
        verbose=2,
        shuffle=True,
        eval_data=eval_data,
        metrics=metrics,
    )

the outputs seems training runs well, but eval failed with exception

outputs with keyerror:

n_users: 446354, n_items: 67291, data density: 0.0188 %

============================== PinSageDGL ============================== Training start time: 2023-12-07 18:29:50 train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8687/8687 [11:11<00:00, 12.93it/s] Epoch 1 elapsed: 671.941s train_loss: 0.269 item embeds: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:02<00:00, 11.06it/s] Traceback (most recent call last): ...... pinsage_dgl.fit( File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/bases/embed_base.py", line 137, in fit self.trainer.run( File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/training/torch_trainer.py", line 128, in run print_metrics( File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/evaluation/evaluate.py", line 183, in print_metrics eval_metrics = metrics_fn(data=eval_data, metrics=metrics) File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/evaluation/evaluate.py", line 105, in evaluate data = build_eval_transformed_data(model, data, neg_sampling, seed) File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/evaluation/computation.py", line 20, in build_eval_transformed_data data.build_negatives(model.n_items, num_neg, seed=seed) File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/data/transformed.py", line 142, in build_negatives items_neg = self._sample_neg_items( File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/data/transformed.py", line 156, in _sample_neg_items return negatives_from_unconsumed( File "/data/anaconda3/envs/dgl/lib/python3.9/site-packages/libreco/sampling/negatives.py", line 73, in negatives_from_unconsumed if n != i and n not in u_negs and n not in user_consumed_set[u]: KeyError: 0

massquantity commented 9 months ago

Hi, please provide the code on how you split the data. And what are the values in the "label" column of the original data?

zihaozhu93 commented 9 months ago

Hi, please provide the code on how you split the data. And what are the values in the "label" column of the original data?

Thanks reply.

I follow the examples as feat_ranking_example.py

if __name__ == "__main__":
    start_time = time.perf_counter()
    data = pd.read_csv("data/expose_post/detail_post.csv", sep=",", header=0)
    train_data, eval_data = split_by_ratio_chrono(data, test_size=0.2)

label column contains only 0,1

massquantity commented 9 months ago

Samples with label 0 in the original data are not included in user_consumed_set.

Since u are using negative sampling(neg_sampling=True), try setting all the labels to 1.

zihaozhu93 commented 9 months ago

Thanks a lot. It works!

massquantity / LibRecommender

sample_neg_items report keyError #425