saliteta / SA-GS-CODE

30 stars 4 forks source link

Questions about some details in the paper #1

Closed JIANG-CX closed 1 week ago

JIANG-CX commented 1 week ago

I have some questions about the details in this paper:

  1. Before Eq. 3, it states: "Therefore, we can obtain the expected Gaussian Splats number by multiply the edge number with a constant. This constant is usually determined by the overlapping ratio of the input images." I'm confused about the meaning of "overlapping ratio." Could you provide more details about how to choose this constant, as I believe this is a key parameter?
  2. Before Eq. 5, it mentions: "Then we assign the expected shape to each Gaussian online. In this way, we circumvent the problem of inconsistency." If one Gaussian is assigned different semantic labels in two images, how does this strategy overcome the inconsistency?
  3. In Eq.3, why k1 > k2.
  4. In Eq.3, what's the definition of Pi in 1/Pi = k2a2? is it as same as 1/Pi = k1a1 ?

Thanks.

saliteta commented 1 week ago

Regarding your first question, the overlapping region implies that for one object, we might only have 3 images around it in one dataset, while in another dataset, we might have 300 images of the same object. Although the first dataset has a larger sum of edges, the distribution should be the same. During implementation, we divide by the number of pixels to calculate average complexity. However, another approach could be to calculate the number of overlapping regions.

Your question is crucial. It does not address conflicts between labels. Since training involves rasterizing one image at a time, there are no issues during a single iteration. However, we cannot resolve inconsistencies between labels across different images. Although training different labels separately in different Gaussian Splats could result in an all-black scene, our method is more tolerable. Some splats might receive different optimization goals in different iterations, but the training process can proceed smoothly.

k1 corresponds to the shorter axis, a1, while k2 is for the longer axis, a2.

p_i represents perplexity. We know the perplexity and aim to determine a1 and a2. Since k1 is larger, a1 is the shorter axis, and a2 is the longer axis.

JIANG-CX commented 1 week ago

Thanks for your reply.

Since the definitions of sx, sy, and sz are the scale factors along the x, y, and z axes respectively, the x, y, and z coordinates do not have a direct relationship with the longer and shorter axes. So it is unclear why the parameter k1 corresponds to the shorter axis and k2 corresponds to the longer axis.

Additionally, I found the symbol definitions to be somewhat confusing in the paper:

(1) There are three different 'p' variables used - one in Eq. 2 and two in Eq. 3. In your previous reply, you stated that 'p' represents the perplexity, but could you please provide the precise definition of each of these three 'p' variables? I think they should have different meaning as you use different symbols.

(2) In Eq. 4, what is the definition of the symbol σ?"

Thanks.

saliteta commented 1 week ago

Regarding your question "x, y, and z coordinates do not have a direct relationship with the longer and shorter axes". We indeed considering this problem before. But in implementation, we find we can simply regard, x shortes, y middle, and z longer. Since x,y,z does not represent orientation, instead, quaternion represents the orientation. Assuming x,y,z axis ranking in seuqence will not make our method lose its generality.

For this question: "'p' variables used", indeed, we have different meaning for different p, the Large P in Eq.2 means the overall Perplexity, and small P in Eq.3 means the unit perplexity small p_j. In Eq.3, there are two different P, but actually it is a typo, we will modify it soon.

For your last question, sigma means sigmoid function. We use it to bound the loss.

JIANG-CX commented 1 week ago

Thanks for your reply. I have no further questions. Great work!