Open denadai2 opened 3 months ago
PS: I wonder whether this works https://github.com/pyg-team/pyg-lib/blob/95aeaaaccebc2317d2a6de4cbdf903c15a3541a8/pyg_lib/csrc/sampler/cpu/dist_merge_outputs_kernel.cpp#L90 when we have empty lists https://github.com/pyg-team/pytorch_geometric/blob/f7ed25ded654bc89f3bfc649b6caffead5b49a6b/torch_geometric/distributed/dist_neighbor_sampler.py#L830 @rusty1s
I converted it in a non-optimized plain python version and partially fixed it (not unit tested):
def merge_outputs(
node_ids: List[torch.Tensor],
edge_ids: List[torch.Tensor],
cumsum_neighbors_per_node: List[List[int]],
partition_ids: List[int],
partition_orders: List[int],
num_partitions: int,
num_neighbors: int,
batch: Optional[torch.Tensor] = None,
disjoint: bool = False
) -> Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor], List[int]]:
if num_neighbors < 0:
# Find maximum population
population = [[] for _ in range(num_partitions)]
max_populations = [0] * num_partitions
for p_id in range(num_partitions):
cumsum1 = cumsum_neighbors_per_node[p_id][1:]
cumsum2 = cumsum_neighbors_per_node[p_id][:-1]
population[p_id] = [abs(a - b) for a, b in zip(cumsum1, cumsum2)]
max_populations[p_id] = max(population[p_id])
offset = max(max_populations)
else:
offset = num_neighbors
p_size = len(partition_ids)
sampled_neighbors_per_node = [0] * p_size
sampled_node_ids = torch.full((p_size * offset,), -1, dtype=node_ids[0].dtype)
sampled_edge_ids = torch.full((p_size * offset,), -1, dtype=edge_ids[0].dtype)
sampled_batch = torch.full((p_size * offset,), -1, dtype=batch.dtype) if disjoint else None
sampled_node_ids_vec = [n.tolist() for n in node_ids]
sampled_edge_ids_vec = [e.tolist() for e in edge_ids]
#print("cumsum_neighbors_per_node", cumsum_neighbors_per_node, "partition_ids", partition_ids)
for j in range(p_size):
p_id = partition_ids[j]
p_order = partition_orders[j]
if not cumsum_neighbors_per_node[p_id] or len(cumsum_neighbors_per_node[p_id]) <= p_order+1:
continue
#print("cumsum_neighbors_per_node", len(cumsum_neighbors_per_node[p_id]) <= p_order+1, p_order, len(cumsum_neighbors_per_node[p_id]))
begin_node = cumsum_neighbors_per_node[p_id][p_order]
begin_edge = begin_node - cumsum_neighbors_per_node[p_id][0]
end_node = cumsum_neighbors_per_node[p_id][p_order + 1]
end_edge = end_node - cumsum_neighbors_per_node[p_id][0]
sampled_node_ids[j * offset:(j * offset + end_node - begin_node)] = torch.tensor(sampled_node_ids_vec[p_id][begin_node:end_node])
sampled_edge_ids[j * offset:(j * offset + end_edge - begin_edge)] = torch.tensor(sampled_edge_ids_vec[p_id][begin_edge:end_edge])
if disjoint:
sampled_batch[j * offset:(j * offset + end_node - begin_node)] = batch[j]
sampled_neighbors_per_node[j] = end_node - begin_node
# Remove auxiliary -1 numbers:
valid_node_indices = sampled_node_ids != -1
out_node_id = sampled_node_ids[valid_node_indices]
valid_edge_indices = sampled_edge_ids != -1
out_edge_id = sampled_edge_ids[valid_edge_indices]
out_batch = sampled_batch[valid_node_indices] if disjoint else None
return out_node_id, out_edge_id, out_batch, sampled_neighbors_per_node
Sorry for late reply. This looks indeed wrong. Can you share your inputs that make it crash? Also @kgajdamo for visibility.
No worries! I used movielens with 4 partitions and the code that was released. No modifications
🐛 Describe the bug
Dear pyg-lib team,
I encountered an error when I call.
the error is:
do you have a suggestion on how to debug this?
thx
Environment
pyg-lib
version:pyg-lib
(conda
,pip
, source):