xiaoyao3302 / PoinTramba

PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
https://arxiv.org/abs/2405.15463
Apache License 2.0
39 stars 3 forks source link

Learning of importance score #2

Closed charmeleonz closed 2 months ago

charmeleonz commented 2 months ago

Dear authors,

In Section 3.3 of the paper, I am confused by "calculating Sg for each group embedding is not feasible as it requires a known global feature in the ordering stage, which is impractical". Could you please explain why they are infeasible/impractical?

Thank you.

xiaoyao3302 commented 2 months ago

Hi, thanks for your interest in our work.

Sg is calculated as the cosine similarity between the mapped embeddings and the mapped global feature, so, if we want to calculate Sg, both the embeddings and the global feature are necessary. But how can we get the global feature during the inference? We need to sort the embeddings according to such cosine similarity and feed the reordered embeddings into the Mamba encoder, which is based on a hypothesis that the cosine similarity is known. However, as we just noticed, the calculation of the cosine similarity also requires the known global feature. Therefore, it is a dead loop, which means, during inference, after we feed the point cloud into the Intra-group Transformer Encoder and obtain the embeddings, we cannot calculate the cosine similarity, as currently, the global feature is unknown, thus we cannot further obtain the global feature, either.

We tackle the issue by using an importance score prediction module to enable a small network to reason the embeddings and predict the importance of each embedding, and then we can sort the embeddings according to the predicted importance scores.

Let's discuss more, it is obvious that if we perform an iterative process, ie, we feed the randomly ordered embeddings to obtain the global feature, which we call f_0, then we can calculate the cosine similarity as Sg_0, then we can sort the embeddings according to Sg_0, to obtain the new global feature f_1, then calculate Sg_1, and f_2, etc. Through this iteration, we can approach an optimal set of Sg and f, but it's time-consuming. Therefore, our importance score prediction module can be treated as an approximate solution, which is a little less accurate but is efficient.

Hope my explanation can solve your puzzle.

charmeleonz commented 2 months ago

Hi, thank you for the quick response. It's very clear and helpful.