reczoo / RecZoo

A curated model zoo for recommendation tasks
Apache License 2.0
163 stars 38 forks source link

CPU/GPU index RuntimeError #28

Closed siddhantpathakk closed 1 year ago

siddhantpathakk commented 1 year ago

I am trying to reproduce the UltraGCN paper using this repository, using a Linux SLURM scheduler for training the model on a remote server with access to NVIDIA GPUs. However, when I run the file, this runtime error occurs (for all datasets).

I am assuming the default gpu index should be 0 only, right?

Traceback (most recent call last):
    train(ultragcn, optimizer, train_loader, test_loader, mask, test_ground_truth_list, interacted_items, params)
    loss = model(users, pos_items, neg_items)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
in forward
    omega_weight = self.get_omegas(users, pos_items, neg_items)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
in get_omegas
    pos_weight = torch.mul(self.constraint_mat['beta_uD'][users], self.constraint_mat['beta_iD'][pos_items]).to(device)
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
siddhantpathakk commented 1 year ago

Fixed the issue by changing the following lines!

Change the line to this in the get_omegas function pos_weight = torch.mul(self.constraint_mat['beta_uD'][users.cpu()], self.constraint_mat['beta_iD'][pos_items.cpu()]).to(device)

neg_weight = torch.mul(torch.repeat_interleave(self.constraint_mat['beta_uD'][users.cpu()], neg_items.size(1)), self.constraint_mat['beta_iD'][neg_items.cpu().flatten()]).to(device)

and change the line to this in the cal_loss_I function neighbor_embeds = self.item_embeds(self.ii_neighbor_mat[pos_items.cpu()].to(device)) # len(pos_items) * num_neighbors * dim sim_scores = self.ii_constraint_mat[pos_items.cpu()].to(device) # len(pos_items) * num_neighbors

and lastly in the test function rating += mask[batch_users.cpu()]