johnb30 / py_crepe

Keras implementation of the Crepe character-level convolutional neural net.
74 stars 33 forks source link

Accuracy for the ag_news set stays at ~25%? #8

Closed qichaozhao closed 7 years ago

qichaozhao commented 7 years ago

Hey there,

I've been looking at the Text Understanding from Scratch paper and am attempting to re-implement using Keras. I stumbled across your github during my research.

I cloned the repository and tried to run your code for the AG News dataset (downloaded from here: https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M)

It seems however the accuracy is plateaud at around 25%.

  Step: 100
    Loss: 1.38648176193. Accuracy: 0.254765625
  Step: 200
    Loss: 1.38646401644. Accuracy: 0.25125
  Step: 300
    Loss: 1.38643943111. Accuracy: 0.251510416667
  Step: 400
    Loss: 1.38645074219. Accuracy: 0.2502734375
  Step: 500
    Loss: 1.38647514749. Accuracy: 0.25084375
  Step: 600
    Loss: 1.38645878375. Accuracy: 0.251015625
  Step: 700
    Loss: 1.38649008172. Accuracy: 0.251238839286
  Step: 800
    Loss: 1.38650537863. Accuracy: 0.250751953125
  Step: 900
    Loss: 1.38651740736. Accuracy: 0.250251736111
Epoch 0. Loss: 1.38653714458. Accuracy: 0.250217014054
Epoch time: 0:24:32.763016. Total time: 0:24:35.527443

Epoch: 1
  Step: 100
    Loss: 1.38628291965. Accuracy: 0.255078125
  Step: 200
    Loss: 1.38648118854. Accuracy: 0.2527734375
  Step: 300
    Loss: 1.38653985461. Accuracy: 0.252526041667
  Step: 400
    Loss: 1.38657643467. Accuracy: 0.25287109375
  Step: 500
    Loss: 1.38657280207. Accuracy: 0.252375
  Step: 600
    Loss: 1.38655447821. Accuracy: 0.251770833333
  Step: 700
    Loss: 1.38658778497. Accuracy: 0.25125
  Step: 800
    Loss: 1.38658027261. Accuracy: 0.250205078125
  Step: 900
    Loss: 1.38657693717. Accuracy: 0.250052083333
Epoch 1. Loss: 1.3863426288. Accuracy: 0.249565972139
Epoch time: 0:26:23.999443. Total time: 0:51:02.722974

Epoch: 2
  Step: 100
    Loss: 1.38650754809. Accuracy: 0.25046875
  Step: 200
    Loss: 1.38648163974. Accuracy: 0.2524609375
  Step: 300
    Loss: 1.38651079893. Accuracy: 0.2515625
  Step: 400
    Loss: 1.38658412933. Accuracy: 0.2498828125
  Step: 500
    Loss: 1.38661288977. Accuracy: 0.248421875
  Step: 600
    Loss: 1.38656823456. Accuracy: 0.2479296875
  Step: 700
    Loss: 1.38656216434. Accuracy: 0.248359375
  Step: 800
    Loss: 1.38656521723. Accuracy: 0.24884765625
  Step: 900
    Loss: 1.38658299618. Accuracy: 0.248802083333
Epoch 2. Loss: 1.38656448523. Accuracy: 0.250651041667
Epoch time: 0:24:38.453619. Total time: 1:15:44.417696

Epoch: 3
  Step: 100
    Loss: 1.38659875035. Accuracy: 0.253203125
  Step: 200
    Loss: 1.38658913493. Accuracy: 0.250703125
  Step: 300
    Loss: 1.38660971284. Accuracy: 0.249505208333
  Step: 400
    Loss: 1.38659906924. Accuracy: 0.250078125
  Step: 500
    Loss: 1.38659796071. Accuracy: 0.25034375

I am using a newer version of Keras (2.0.3) also with the Tensorflow backend, but given I didn't modify your code in any other way (apart from path names for the training / test data), I am unclear as to why it's doing this.

I will do some further testing by running with the THEANO backend and an older version of Keras to see if I can replicate your results.

However, in the meantime, I am just curious if this something you encountered at all when you were writing this code?

At least its consistent with my own implementation, which also is converging on 25% accuracy and staying there. :|

qichaozhao commented 7 years ago

Update, running it on the THEANO backend with an older version of Keras seems to bring much better results... I guess I will have to keep digging as to where the differences are coming from!

liufuyang commented 6 years ago

I seem to encounter the same problem, basically using the same code and fix a few minor place to be able to run on python 3, but the loss is not reducing and acc stays at 0.25

liufuyang commented 6 years ago

Saw discussion from another issue post, basically you seem to have to use RandomNormal as mentioned here https://keras.io/initializers/, instead of using the default glorot_uniform Just to add it in all the convolution layers like this

initializer = RandomNormal(mean=0.0, stddev=0.05, seed=None)
...
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0], kernel_initializer=initializer,
                         border_mode='valid', activation='relu',
                         input_shape=(maxlen, vocab_size))(inputs)

Initialization matters !!!

ayrtondenner commented 6 years ago

I finished running the neural network with ag_news set, using Keras version 2.1.5, Python 3.6.5, and I couldn't get Theano version, it seems it isn't installed. Doing it without any initializer in convolution layer, my network managed to get between 87%~92% of accuracy

Epoch: 8
  Step: 100
        Loss: 0.2301556259393692. Accuracy: 0.9221250146627427
  Step: 200
        Loss: 0.23537617575377226. Accuracy: 0.9201875126361847
  Step: 300
        Loss: 0.23534504453341165. Accuracy: 0.9205833458900452
  Step: 400
        Loss: 0.23604939280077816. Accuracy: 0.9202500122785569
  Step: 500
        Loss: 0.23330221936106682. Accuracy: 0.9211000119447708
  Step: 600
        Loss: 0.2331778311356902. Accuracy: 0.9213541778922081
  Step: 700
        Loss: 0.2310873696208. Accuracy: 0.9223928680590221
  Step: 800
        Loss: 0.23309540571644902. Accuracy: 0.9215625112503767
  Step: 900
        Loss: 0.23280121087200112. Accuracy: 0.9213472335868411
  Step: 1000
        Loss: 0.234288040317595. Accuracy: 0.9208625118732452
  Step: 1100
        Loss: 0.23586858349090273. Accuracy: 0.9205227390744469
  Step: 1200
        Loss: 0.2381015682592988. Accuracy: 0.9195833456019561
  Step: 1300
        Loss: 0.2377806937465301. Accuracy: 0.9194711661338806
  Step: 1400
        Loss: 0.2373836720788053. Accuracy: 0.9197500126702445
  Step: 1500
        Loss: 0.23760153172910214. Accuracy: 0.9196000128587087
Epoch 8. Loss: 0.3596184071741606. Accuracy: 0.8811842416462146
Epoch time: 0:07:20.014854. Total time: 1:08:38.125534

Epoch: 9
  Step: 100
        Loss: 0.21075385928153992. Accuracy: 0.9301250070333481
  Step: 200
        Loss: 0.21289151340723036. Accuracy: 0.9299375078082085
  Step: 300
        Loss: 0.21473157433172066. Accuracy: 0.9292916737000148
  Step: 400
        Loss: 0.21344165759161116. Accuracy: 0.9293750070035458
  Step: 500
        Loss: 0.21545475232601166. Accuracy: 0.9288500069379807
  Step: 600
        Loss: 0.21487088636805615. Accuracy: 0.9284791740775108
  Step: 700
        Loss: 0.21380332414593015. Accuracy: 0.9287678642783846
  Step: 800
        Loss: 0.21491042390465737. Accuracy: 0.9276875076442956
  Step: 900
        Loss: 0.21594439257350234. Accuracy: 0.9273194525639216
  Step: 1000
        Loss: 0.21687575853615998. Accuracy: 0.9271750084161758
  Step: 1100
        Loss: 0.21711503727869555. Accuracy: 0.9269545538858934
  Step: 1200
        Loss: 0.21854913759355743. Accuracy: 0.9265833417574565
  Step: 1300
        Loss: 0.21921507142484187. Accuracy: 0.9262211626768112
  Step: 1400
        Loss: 0.21991488710578. Accuracy: 0.9260446515253612
  Step: 1500
        Loss: 0.21934504335621993. Accuracy: 0.9262916754086812
Epoch 9. Loss: 0.3565275182849483. Accuracy: 0.8784210826221265
Epoch time: 0:07:19.129080. Total time: 1:16:13.310350