Discussion about implementation and model

nlgranger commented 3 years ago

Firstly, thank you for your work, an up-to-date pytorch implementation of RandLA is really nice to have. This is not a bug report but rather a series of questions I had when I started implementing RandLA before I found this repository.

The loss weights in Pytorch CrossEntropy are normalized so that they sum to one, I don't think the original code has this normalization, but removing it would generate huge loss values.
Why use ConvLayers, since they all have 1x1 kernels, Linear layers would work just as well?
regarding https://github.com/qiqihaer/RandLA-Net-pytorch/blob/a255652b65d8378682479c23299704783a8fe4d9/RandLANet.py#L58 it matches the tensorflow code but not the fig. 7 of the paper where the shortcut originates from before the first res block, right?

tsunghan-wu commented 3 years ago

For the questions above:

As the original implementation, I think the author also use weights that sum to one.
After thinking and browsing web for a long time, I think 1x1 convolution layer is equivalent to fully connected layer. (I haven't thought it before, thanks for your great findings XD~) Thus, I think I can also work well if you substitute it to Linear layers.
Well ...... I'm not sure whether it contains feature concatenation you mentioned above (?)

nlgranger commented 3 years ago

Sorry for the delay and thank you for looking into this.

I mis-interpreted the pytorch doc about the weights normalization, but I believe there is still a discrepancy. I think what the original implementation does is:

    loss = F.cross_entropy(predictions, targets, reduction='none')
    weights = class_weights[targets]
    loss *= weights
    return loss.sum() / loss.shape[0]

Whereas yours is equivalent to

    loss = F.cross_entropy(predictions, targets, reduction='none')
    weights = class_weights[targets]
    loss *= weights
    return loss.sum() / weights.sum()

This is what I mean:

tsunghan-wu commented 3 years ago

Hi,

I'm really sorry for the late reply due to some personal affairs. After reading your latest reply, I understood your concern. Therefore, I looked into the original repository and would like to figure out whether I made a mistake. As I found the original loss function implementation here:

def get_loss(self, logits, labels, pre_cal_weights):
    # calculate the weighted cross entropy according to the inverse frequency
    class_weights = tf.convert_to_tensor(pre_cal_weights, dtype=tf.float32)
    one_hot_labels = tf.one_hot(labels, depth=self.config.num_classes)
    weights = tf.reduce_sum(class_weights * one_hot_labels, axis=1)
    unweighted_losses = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=one_hot_labels)
    weighted_losses = unweighted_losses * weights
    output_loss = tf.reduce_mean(weighted_losses)
    return output_loss

I think you are correct. It first calculates the unweighted losses (same as F.cross_entropy(predictions, targets, reduction='none')), then weights them using the pre-compute class weights (same as weights = class_weights[targets], loss *= weights). However, in the final step, the original implementation apply tf.reduce_mean(weighted_losses), while my implementation use another term loss.sum() / weights.sum().

Really thanks for your great catching. I will modify and push the program if I am not that busy. However, I think this is not a big issue because loss.shape[0] and weights.sum() are just constant values. The former is the same as batch size and the later is a pre-computed value.

tsunghan-wu commented 3 years ago

Close issue for long inactive.

tsunghan-wu / RandLA-Net-pytorch

Discussion about implementation and model #5