Open Randnx opened 7 years ago
Yeah I misread the architecture lol. Haven't gotten around to fixing it, but the code still worked albeit reduced accuracy.
I'm also a little confused by this comment: "Assuming a batch size of 20, about one fourth of the dataset consisted of images that were too large to fit on a Titan X GPU.". Can a smaller batch size make full dataset fit on a Titan X GPU?
Yes - since we're dealing with recurrent nets, long equations will result in more timesteps per equation, which means the training process will require more VRAM to store the data of each timestep in memory. What I found was that if we sort the equations by length (essentially character count), then the top 25% of equations (by length) were too "large" to train with on a titan X when using a batch size of 20. If you reduce the batch size, fewer training examples will allow more VRAM per training example, which will allow the longer examples to fit in the available VRAM.
Thanks, that is a great help (:
Hi, I am reading your excellent code, but find the CNN layer order is reversed from the paper author's code(https://github.com/harvardnlp/im2markup/blob/master/src/model/cnn.lua). Is this a problem?