Nan loss after few epochs

yassienshaalan commented 6 years ago

Hey,

I got the code a while ago and trying to run it. However, after setting everything up I used to get Nan loss after the first epoch. I tried so many things, from changing optimizers to checking for null values in inputs, to changing learning rates. Only after your latest change of hard setting the batches_per_epoch to 1000 things became better, however still starting from epoch 6 it gives back nan loss values. What seems to be the problem? also, I couldn't reproduce the paper precision and recall values the best I could get after which is only after 5 epochs are: This is the restaurant dataset precision-recall f1-score support

     Food      0.786     0.183     0.296       887
    Staff      0.527     0.281     0.367       352
 Ambience      0.327     0.131     0.188       251

ruidan commented 6 years ago

Hi,

I am not sure why you got the NaN loss, since I didn't encounter this issue with the current code or the code before changing batches_per_epoch to 1000. So far, no other people has reported this issue. Maybe you use another machine to run?

As for evaluation, did you manually assign the cluster_map by yourself in evaluation.py? As the cluster_map I provided in the evaluation.py is only used for the uploaded trained model. If you train a model again, you need to manually assign the mapping first before evaluation.

You can try to evaluate the uploaded trained restaurant model by running evaluation.py directly. This will give you similar results as reported in paper. And you can have a look at how I assigned the aspect label to each cluster by looking at the cluster_map and the aspect.log in pre_trained_model/restaurant/.

yassienshaalan commented 6 years ago

Thanks I will look into the evaluation part, however, there still some correlation between the gradient explosion (causing the nan loss) and the number of batches per epoch (may be a certain training set out of the dataset) cause the problem. I will still look into it and update you when I get something new.

Regards

From: Ruidan He Sent: Wednesday, 4 July 2018 3:06 PM To: ruidan/Unsupervised-Aspect-Extraction Cc: yassienshaalan; Author Subject: Re: [ruidan/Unsupervised-Aspect-Extraction] Nan loss after few epochs(#9)

Hi, I am not sure why you got the NaN loss, since I didn't encounter this issue with the current code or the code before changing batches_per_epoch to 1000. So far, no other people has reported this issue. Maybe you use another machine to run? As for evaluation, did you manually assign the cluster_map by yourself in evaluation.py? As the cluster_map I provided in the evaluation.py is only used for the uploaded trained model. If you train a model again, you need to manually assign the mapping first before evaluation. You can try to evaluate the uploaded trained restaurant model by running evaluation.py directly. This will give you similar results as reported in paper. And you can have a look at how I assigned the aspect label to each cluster by looking at the cluster_map and the aspect.log in pre_trained_model/restaurant/. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

SericWong commented 5 years ago

I I had the same problem. since I used a large data（3million). Have you solved the problem yet?

agarnitin86 commented 5 years ago

I am also getting nan values

Aspect 0: welsh:nan staff:nan bitches:nan sick":nan stolen:nan christmas:nan edward:nan genius:nan selena:nan emily:nan socks:nan 21st:nan kings:nan roof:nan incredibly:nan walmart:nan bein:nan ga:nan luckily:nan gud:nan cricket:nan reunion:nan accidentally:nan kobe:nan steak:nan fridays:nan disneyland:nan snap:nan involved:nan carry:nan security:nan delivery:nan police:nan theatre:nan prince:nan iranelection:nan sounded:nan

Aspect 1: welsh:nan staff:nan bitches:nan sick":nan stolen:nan christmas:nan edward:nan genius:nan selena:nan emily:nan socks:nan 21st:nan kings:nan roof:nan incredibly:nan walmart:nan bein:nan ga:nan luckily:nan gud:nan cricket:nan reunion:nan accidentally:nan kobe:nan steak:nan fridays:nan disneyland:nan snap:nan involved:nan carry:nan security:nan delivery:nan police:nan theatre:nan prince:nan iranelection:nan sounded:nan

and so on....

pbabvey commented 5 years ago

I encountered this problem for some data. Actually, vector representation of aspects is exactly the same. Thus, ortho_reg loss is nan. Here there is an edited version of the original code with some fixes and modification, I tried, it worked

ruidan / Unsupervised-Aspect-Extraction

Nan loss after few epochs #9