ymym3412 / textcnn-conv-deconv-pytorch

text convolution-deconvolution auto-encoder model in PyTorch
Apache License 2.0
56 stars 14 forks source link

Issues running your model #1

Open opletayev opened 6 years ago

opletayev commented 6 years ago

Hello,

First of all, let me thank you for putting this together. I was very curious about the paper, but their TF implementation is rather poor and very hard to understand. Yours is very clean and makes a lot more sense!

I ran your model with default parameters in the reconstruction mode on the Hotel dataset on a single Tesla K80 machine. It took 20+ hours to train for 10 epochs, and the model didn't converge (see below). The loss has never moved below 22,000.

I have a few questions:

1) Is there something that I am doing wrong? Are there any parameters that need to be specified to make the model work? I checked the defaults for the parameters and they looked in line with the paper.

2) You use log softmax as the loss function for the deconvolutional model and I assume that's why the model is taking so long to train. I know that's what the paper recommends, but have you tried using adapative softmax instead?

3) What are your thoughts on seeding the embedding matrix with pre-learned embeddings? I am curious if using L2-normalized Glove embeddings would speed up the training.

4) I also tried to train jointly with a classifier using AG News dataset, but MLP classifier is unhappy about the dimensions it gets.

h = encoder(feature) print(h.shape) prob = decoder(h) log_prob = mlp(h.squeeze())

h = torch.Size([64, 500, 5, 1]) The last dimension gets squeezed, but 64, 500, 5 vector is not compatible with the 500x300 FC layer:

RuntimeError: size mismatch, m1: [32000 x 5], m2: [500 x 300] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1518385717421/work/torch/lib/TH/generic/THTensorMath.c:1434

I would greatly appreciate any guidance you could give me on these!

======= RESULTS ==========

Input Sentence: stayed two nights in this hotel for our 20th anniversary . the location is fantastic , near great shopping , restaraunts and entertainment . the staff was great . the bed was the most comfortable i have ever slept in . i wanted to take it home with me ! the rooms and halls were quiet and peaceful . the bathroom was incredible , sparkling marble , huge space , impecably clean . the only down side was how expensive it was to park our car . yikes ! over all we could not have asked for a better hotel and we will definately stay here again . it was worth every penny . END_TOKEN

Output Sentence: ricca raggiungibile duur raggiungibile uhr tasse toujours nuestro nogal tren dava frequentato bagno altre uhr krijg salir krap toujours krijg uhr deve l'albergo misma uhr frequentato quand atencion standaard frequentato uhr avere uhr cambiare arredamento precios preso gevraagd bekommt dotate interessante parken l'albergo z'n uhr accanto uhr raggiungibile uhr stanze uhr krijg uhr spazi aeropuerto kwamen uhr mocht ruido frequentato uhr avere bekommt all'arrivo salir totalmente uhr zentral bekommt spettacolare l'albergo llegamos dava frequentato servizio pesar bekommt metropolitana serviable stanze salir relativamente jahre relativamente bekommt arrivati passa z'n uhr trova naechte necesario suis raam l'albergo necesario l'albergo z'n servizio hemos l'albergo enkel aeropuerto citta foi zoek nostro estar salir avere l'albergo heerlijk verkennen andando salir particolarmente trova pagamento trova trovate trova acondicionado trova frigorifero trova trovate trova trovate trova acondicionado trova frigorifero trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova strasse trova trovate trova trovate trova trovate trova strasse trova trovate trova trovate trova trovate trova trova Epoch: 10

Epoch: 10 Steps: 108920 Loss: 22058.16015625 Eval Evaluation - loss: 683.1286144549368 Rouge1: 1.5889867148342671e-06 Rouge2: 0.0 Finish!!!

fcampagne commented 5 years ago

I would check that the gradient is calculated by printing loss.grad. It's easy to use variables that don't ask for the gradient and then the loss oscillates but never gets optimized.