Equivalent of torch.nonzero ?

Hi,

I'm a relatively new Rust learner and wanted to use Candle as a way of learning more about this language. I am trying to implement in Rust the SuperPoint model that I've added to 🤗 transformers a couple of weeks ago. The problem is I'm stuck at the step where the model switches from dense to sparse tensors. Basically, the model takes an image as input and detects keypoints. After some convolutions, an NMS is applied and keypoints with a score different from 0 are extracted and passed through the rest of the forward method. So originally we have a tensor of shape (height, width), we apply torch.nonzero to get the 2D (y and x coordinates) indices of keypoints, and use these indices from the tensor to get the list of scores, ending with a tensor of the shape (num_keypoints, 2).

The problem comes from the first step, I can't figure out how to check for individual values and replicate torch.nonzero behavior without looping through the whole tensor like following :

fn non_zero(t: &Tensor) -> Result<Tensor> {
    let (height, width) = t.dims2()?;

    let mut indices = vec![];
    let mut num = 0;
    for i in 0..height {
        for j in 0..width {
            let val: Tensor = t.i((i, j))?;
            let scalar: f32 = val.to_scalar()?;
            if scalar != 0. {
                indices.push(i as u32);
                indices.push(j as u32);
                num += 1;
            }
        }
    }

    let non_zero_indices = Tensor::from_vec(indices, (num, 2), &t.device())?;
    Ok(non_zero_indices)
}

While it is not a problem for a small tensor, it takes a very long time for a 640x480 image for example, is there another way of doing that or is it in the plans to implement this PyTorch function in Candle ?

I managed to implement the second step for the 2D indexing search with the following code :

fn two_dim_index(t: &Tensor, coordinates: &Tensor) -> Result<Tensor> {
    let coordinates_t = coordinates.t()?;

    let coordinates_x = coordinates_t.i(0)?;
    let coordinates_y = coordinates_t.i(1)?;

    let t_x = t.index_select(&coordinates_x, 0)?;
    let t_y = t_x.index_select(&coordinates_y, 1)?;

    let (_, t_shape_minus_1) = t.dims2()?;
    let coordinates_shape_0 = coordinates_x.dims1()?;

    let b_coordinates_y = coordinates_y.unsqueeze(Minus1)?.broadcast_as((coordinates_shape_0, t_shape_minus_1))?;
    let gathered_t_x = t_x.gather(&b_coordinates_y.contiguous()?, Minus1)?;
    let max_gathered_t = gathered_t_x.max(Minus1)?;
    Ok(max_gathered_t.clone())
}

Does that seem like a correct implementation or there may be a better way of doing this step ? In this one I don't have a major problem with computation time, it's just for my knowledge

Thanks !

Interesting, two suggestions:

A very easy way around it would be to handle this on the cpu, so get the results back from the device to a rust vec of vec and do the double for loop using this. It will be much faster than accessing the tensors inside the double for loop as you're doing at the moment as it won't require getting the data one element at a time.
Use a custom-op: you can find an example here, and you will have to provide implementations for all the backend you want to support.

The first suggestion is probably the easiest, and hopefully it's close to the point where you want to get the data back to the cpu anyway so youwon't really need to convert things back to a tensor. You can check how we do it for the segment anything example though it's certainly a bit convoluted.

huggingface / candle

Equivalent of torch.nonzero ? #2109