Why did you comment out the softmax in `2. BiDAF.ipynb`?

kushalj001 / pytorch-question-answering

Important paper implementations for Question Answering using PyTorch

MIT License

274 stars 50 forks source link

Why did you comment out the softmax in `2. BiDAF.ipynb`? #4

Closed shahhaard47 closed 4 years ago

shahhaard47 commented 4 years ago

This is at the end of the BiDAF.forward

        p1 = p1.squeeze()
        # [bs, ctx_len]

        #p1 = F.softmax(p1, dim=-1)

        # END PREDICTION

        p2 = self.output_end(torch.cat([G, M2], dim=2)).squeeze()
        # [bs, ctx_len, 1] => [bs, ctx_len]

        #p2 = F.softmax(p2, dim=-1)

shahhaard47 commented 4 years ago

Also, I probably missed something in the paper, but why don't you include bias in any Linear layers? You do:

nn.Linear(..., bias=False)

kushalj001 commented 4 years ago

Also, I probably missed something in the paper, but why don't you include bias in any Linear layers? You do:

If you read the paper closely, you'll see that we usually only need a weight matrix and not a linear layer per se. A linear layer acts as a weight matrix if you make bias=False. Alternatively, you could have also used the nn.Parameter() to initialize the weight matrix, so that it gets added to the list of model parameters. I think using nn.Linear reduces some boilerplate code.

kushalj001 commented 4 years ago

This is at the end of the BiDAF.forward

I don't use softmax in the model because I use nn.CrossEntropy to calculate the loss which takes care of all that internally.

shahhaard47 commented 4 years ago

Thank you so much for the response!