Unable to use torch tensors as Point or Matrix in differentiable rendering setting

ShnitzelKiller commented 1 year ago

Summary

I would like to use a torch tensor to control transforms of objects in the scene. Instantiating mitsuba vector/matrix types using pytorch tensors exhibits inconsistent behavior (cannot only seem to create batches greater than 1) and I get errors attempting to make Matrix4f or Point3f objects from TensorXf or torch tensors in any way.

System configuration

OS: Windows-10
  CPU: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
  GPU: NVIDIA GeForce RTX 2060 SUPER
  Python: 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:14:58) [MSC v.1929 64 bit (AMD64)]
  NVidia driver: 528.49
  CUDA: 11.6.124
  LLVM: 15.-1.-1

  Dr.Jit: 0.4.1
  Mitsuba: 3.2.1
     Is custom build? False
     Compiled with: MSVC 19.34.31942.0
     Variants:
        scalar_rgb
        scalar_spectral
        cuda_ad_rgb
        llvm_ad_rgb
System information:

  OS: ...
  CPU: ...
  GPU: ...
  Python version: ...
  LLVM version: ...
  CUDA version: ...
  NVidia driver: ...

  Dr.Jit version: ...
  Mitsuba version: ...
     Compiled with: ...
     Variants compiled: ...

Description

I cannot instantiate a Matrix4f from a corresponding 4x4 torch tensor. I have tried all manner of input shapes, and it appears that ndim must be 3. I can create a batched set of Matrix4f as long as the input tensor has shape (N, 4, 4) where N is greater than 1. If it is equal to 1, for some reason I end up with 16 4x4 matrices, all with identical elements (though different from one another).

import mitsuba as mi
mi.set_variant('cuda_ad_rgb')
import torch

t = torch.arange(16).reshape((1,4, 4)).float().cuda()
mi.Matrix4f(t)

The result:

[[[0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0]],
 [[1.0, 1.0, 1.0, 1.0],
  [1.0, 1.0, 1.0, 1.0],
  [1.0, 1.0, 1.0, 1.0],
  [1.0, 1.0, 1.0, 1.0]],
 [[2.0, 2.0, 2.0, 2.0],
  [2.0, 2.0, 2.0, 2.0],
  [2.0, 2.0, 2.0, 2.0],
  [2.0, 2.0, 2.0, 2.0]],
 [[3.0, 3.0, 3.0, 3.0],
  [3.0, 3.0, 3.0, 3.0],
  [3.0, 3.0, 3.0, 3.0],
  [3.0, 3.0, 3.0, 3.0]],
 [[4.0, 4.0, 4.0, 4.0],
  [4.0, 4.0, 4.0, 4.0],
  [4.0, 4.0, 4.0, 4.0],
  [4.0, 4.0, 4.0, 4.0]],
 [[5.0, 5.0, 5.0, 5.0],
  [5.0, 5.0, 5.0, 5.0],
  [5.0, 5.0, 5.0, 5.0],
  [5.0, 5.0, 5.0, 5.0]],
 [[6.0, 6.0, 6.0, 6.0],
  [6.0, 6.0, 6.0, 6.0],
  [6.0, 6.0, 6.0, 6.0],
  [6.0, 6.0, 6.0, 6.0]],
 [[7.0, 7.0, 7.0, 7.0],
  [7.0, 7.0, 7.0, 7.0],
  [7.0, 7.0, 7.0, 7.0],
  [7.0, 7.0, 7.0, 7.0]],
 [[8.0, 8.0, 8.0, 8.0],
  [8.0, 8.0, 8.0, 8.0],
  [8.0, 8.0, 8.0, 8.0],
  [8.0, 8.0, 8.0, 8.0]],
 [[9.0, 9.0, 9.0, 9.0],
  [9.0, 9.0, 9.0, 9.0],
  [9.0, 9.0, 9.0, 9.0],
  [9.0, 9.0, 9.0, 9.0]],
 [[10.0, 10.0, 10.0, 10.0],
  [10.0, 10.0, 10.0, 10.0],
  [10.0, 10.0, 10.0, 10.0],
  [10.0, 10.0, 10.0, 10.0]],
 [[11.0, 11.0, 11.0, 11.0],
  [11.0, 11.0, 11.0, 11.0],
  [11.0, 11.0, 11.0, 11.0],
  [11.0, 11.0, 11.0, 11.0]],
 [[12.0, 12.0, 12.0, 12.0],
  [12.0, 12.0, 12.0, 12.0],
  [12.0, 12.0, 12.0, 12.0],
  [12.0, 12.0, 12.0, 12.0]],
 [[13.0, 13.0, 13.0, 13.0],
  [13.0, 13.0, 13.0, 13.0],
  [13.0, 13.0, 13.0, 13.0],
  [13.0, 13.0, 13.0, 13.0]],
 [[14.0, 14.0, 14.0, 14.0],
  [14.0, 14.0, 14.0, 14.0],
  [14.0, 14.0, 14.0, 14.0],
  [14.0, 14.0, 14.0, 14.0]],
 [[15.0, 15.0, 15.0, 15.0],
  [15.0, 15.0, 15.0, 15.0],
  [15.0, 15.0, 15.0, 15.0],
  [15.0, 15.0, 15.0, 15.0]]]

Meanwhile, if I create a (2, 4, 4) tensor, I will get 2 Matrix4f as expected:

t = torch.arange(32).reshape((2,4, 4)).float().cuda()
mi.Matrix4f(t)

The result:

[[[0.0, 1.0, 2.0, 3.0],
  [4.0, 5.0, 6.0, 7.0],
  [8.0, 9.0, 10.0, 11.0],
  [12.0, 13.0, 14.0, 15.0]],
 [[16.0, 17.0, 18.0, 19.0],
  [20.0, 21.0, 22.0, 23.0],
  [24.0, 25.0, 26.0, 27.0],
  [28.0, 29.0, 30.0, 31.0]]]

The same thing happens with Point3f objects; I need a (N, 3) tensor with N > 1 or face the same unpredictable behavior. Am I misunderstanding how vectorization works? I cannot pass a tensor with only 2 dimensions, as it immediately raises an error.

What I am trying to do is use a torch tensor (from the output of a neural network) to control the transform of an object in a scene. Is this the right approach at all? I have even tried setting individual elements of the matrix to elements of the tensor to no avail (I get the Refusing to do an extremely inefficient element-by-element array conversion from type <class 'drjit.cuda.ad.TensorXf'> to <class 'drjit.cuda.ad.Float'> error). Basically I have not seen any example code using Torch tensors in place of matrices/points in Mitsuba code; there is only the one example here which replaces a TensorXf texture with a Torch tensor.

njroussel commented 1 year ago

Hi @ShnitzelKiller

I believe this is a bug. When using numpy the behaviour for 1-deep vectorized types is as you expected here. Let me transfer this issue to the appropriate repository.

njroussel commented 1 year ago

I have pushed a fix: https://github.com/mitsuba-renderer/drjit/commit/16b388292b5cb1e532b43a8800f1cca95a17c513

This DLPack protocol is surprisingly "brittle", there's a lot of room let to interpretation and practically every framework seems to differ from another ever so slightly in their implementation. I would not be surprised if there are still some conversions that we do not handle correctly.

Anyway, this particular fix will be available in our next release.

mitsuba-renderer / drjit