masadcv / FastGeodis

Fast Implementation of Generalised Geodesic Distance Transform for CPU (OpenMP) and GPU (CUDA)
https://fastgeodis.readthedocs.io
BSD 3-Clause "New" or "Revised" License
90 stars 14 forks source link

Non-deterministic behaviour from cuda-version generalised_geodesic3d #54

Open monaxu1 opened 10 months ago

monaxu1 commented 10 months ago

Is your feature request related to a problem? Please describe. The cuda-version generalised_geodesic3d has shown non-deterministic behaviour. Given the same inputs, it returns different output (geodesic map) at different runs. This is not ideal for reproducibility. Please use the following code to reproduce the output I've observed.

"""
Demo of randomness from cuda-version generalised_geodesic3d.
"""
from typing import Tuple
import numpy as np

import torch
import FastGeodis

def torch_seed(seed: int = 42):
    # Pytorch seeding:
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.cuda.manual_seed(seed)

def demo_geodesic_distance3d(
    input_image: torch.Tensor,
    seed_map: torch.Tensor,
    spacing: Tuple[float, float, float],
    device: str,
):
    """
    Demo of 3d geodesic distance.
    """
    # Compute geodesic map using cuda-version FastGeodis
    device = "cuda"
    input_image_pt = input_image.to(device)
    seed_image_pt = (1 - seed_map).to(device)

    fastgeodis_output = np.squeeze(
        FastGeodis.generalised_geodesic3d(
            input_image_pt, seed_image_pt, spacing, 1e10, 1.0
        )
        .cpu()
        .numpy()
    )
    print(f"Sum of fastGeodis output: {np.sum(fastgeodis_output)}")

if __name__ == "__main__":
    torch_seed()
    image: torch.Tensor = torch.randint(
        low=0, high=256, size=(1, 1, 150, 150, 150)
    ).to(torch.float32)
    spacing: Tuple[int, int, int] = (1.0, 1.0, 1.0)
    seed_map: torch.Tensor = torch.full_like(image, 0)
    seed_map[0, 0, 4, 100, 50] = 1
    device: str = "cuda"

    # Compute geodesic map for multiple times using the same inputs
    for i in range(3):
        demo_geodesic_distance3d(
            input_image=image,
            spacing=spacing,
            seed_map=seed_map,
            device=device,
        )

Output - the output from generalised_geodesic3d is different at three runs:

Sum of fastGeodis output: 2537535232.0
Sum of fastGeodis output: 2542801664.0
Sum of fastGeodis output: 2540248320.0

Describe the solution you'd like I'm not sure where the randomness comes from, but It would be nice to seed everything in the cuda-version implementation.

Additional context FastGeodis==1.0.3

masadcv commented 10 months ago

HI @monaxu1

Many thanks for reporting this. I will have a look at it over the weekend. From my previous understanding, the non-deterministic behaviour comes from the CUDA kernels - need to dig deeper into whereas there isnt any randomness apart from the order of execution of each threads.

In the meantime, could you try utilizing the CPU variants, they should not have this behaviour. Thanks!

monaxu1 commented 10 months ago

Thank you. I've already tried the CPU variants, and they don't have any randomness.