huang-yh / GaussianFormer

[ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Other
294 stars 22 forks source link

Gaussian Lifter Coordinate System? #31

Closed seamie6 closed 1 week ago

seamie6 commented 1 month ago

Hello, just looking at the piece of code where you initialise these random points in 3D space for the Gaussians to be at: (gaussian_lifter.py)

        xyz = torch.rand(num_anchor, 3, dtype=torch.float)
        if phi_activation == 'sigmoid':
            xyz = safe_inverse_sigmoid(xyz)
        elif phi_activation == 'loop':
            xyz[:, :2] = safe_inverse_sigmoid(xyz[:, :2])
        else:
            raise NotImplementedError

I understand the voxel will be 200×200×16 as per the ground truth /w the metre range being [-50m, 50m] for XY and [-5m, 3m] for Z For your initialised Gaussians you initialise the XYZ values between 0 and 1. We can think of our grid being centered on 0.5 XYZ. I assume this is all done for stability via normalisation essentially if I am not mistaken? I understand we then do an inverse sigmoid, for normalisation too? My question is, for this grid, what is it supposed to be in reference to? Is it in reference to say the LiDAR sensor? Also is my assumption of this being a normalised version of the occupancy grid correct (regarding dimensions). Thank you.

seamie6 commented 1 week ago

After some testing, I think I can conclude it is in reference to the LiDAR sensor coordinate system for the current frame. And it Is normalised to be this 1x1x1 grid centered on (0.5,0.5,0.5)