huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
13.79k stars 751 forks source link

Equivalent of torch.nonzero ? #2109

Open sbucaille opened 3 weeks ago

sbucaille commented 3 weeks ago

Hi,

I'm a relatively new Rust learner and wanted to use Candle as a way of learning more about this language. I am trying to implement in Rust the SuperPoint model that I've added to 🤗 transformers a couple of weeks ago. The problem is I'm stuck at the step where the model switches from dense to sparse tensors. Basically, the model takes an image as input and detects keypoints. After some convolutions, an NMS is applied and keypoints with a score different from 0 are extracted and passed through the rest of the forward method. So originally we have a tensor of shape (height, width), we apply torch.nonzero to get the 2D (y and x coordinates) indices of keypoints, and use these indices from the tensor to get the list of scores, ending with a tensor of the shape (num_keypoints, 2).

The problem comes from the first step, I can't figure out how to check for individual values and replicate torch.nonzero behavior without looping through the whole tensor like following :

fn non_zero(t: &Tensor) -> Result<Tensor> {
    let (height, width) = t.dims2()?;

    let mut indices = vec![];
    let mut num = 0;
    for i in 0..height {
        for j in 0..width {
            let val: Tensor = t.i((i, j))?;
            let scalar: f32 = val.to_scalar()?;
            if scalar != 0. {
                indices.push(i as u32);
                indices.push(j as u32);
                num += 1;
            }
        }
    }

    let non_zero_indices = Tensor::from_vec(indices, (num, 2), &t.device())?;
    Ok(non_zero_indices)
}

While it is not a problem for a small tensor, it takes a very long time for a 640x480 image for example, is there another way of doing that or is it in the plans to implement this PyTorch function in Candle ?

I managed to implement the second step for the 2D indexing search with the following code :

fn two_dim_index(t: &Tensor, coordinates: &Tensor) -> Result<Tensor> {
    let coordinates_t = coordinates.t()?;

    let coordinates_x = coordinates_t.i(0)?;
    let coordinates_y = coordinates_t.i(1)?;

    let t_x = t.index_select(&coordinates_x, 0)?;
    let t_y = t_x.index_select(&coordinates_y, 1)?;

    let (_, t_shape_minus_1) = t.dims2()?;
    let coordinates_shape_0 = coordinates_x.dims1()?;

    let b_coordinates_y = coordinates_y.unsqueeze(Minus1)?.broadcast_as((coordinates_shape_0, t_shape_minus_1))?;
    let gathered_t_x = t_x.gather(&b_coordinates_y.contiguous()?, Minus1)?;
    let max_gathered_t = gathered_t_x.max(Minus1)?;
    Ok(max_gathered_t.clone())
}

Does that seem like a correct implementation or there may be a better way of doing this step ? In this one I don't have a major problem with computation time, it's just for my knowledge

Thanks !

LaurentMazare commented 3 weeks ago

Interesting, two suggestions:

The first suggestion is probably the easiest, and hopefully it's close to the point where you want to get the data back to the cpu anyway so youwon't really need to convert things back to a tensor. You can check how we do it for the segment anything example though it's certainly a bit convoluted.