Open chrisjbaik opened 8 years ago
Great! it all sounds good. Right now we're still working on getting all of the source data, and so much of our data lives in bucket 0... but right now I have 2546 different functions for which we have both source and valgrind output. There's another about double that which we have valgrind output for but no source.
For the representation, I tokenized our source and have found about 50,000 distinct tokens in the files. I'm happy to play around with further reducing the token names... right now I have a number of c/c++ language identifiers that I count as special tokens, and then every other word counts as a separate token. There are a bunch of ways we could restrict this down, lets talk more about it tomorrow.
I don't have a great answer to the execution time issue... I think we'll have to take it in stride and see how it goes? 1000s of hours obviously doesn't work, but I have no idea about using EC2. If we can get the job done on there, then I don't see an issue chipping in my fair share of 2.5 cents :P
Cool! I'll try to get the rest of the functions in before our meeting tomorrow. I'm also fine with running it on EC2 to get it done :+1:
Status update: trying to get it running on EC2, but ran into a roadblock because I was following this script to download it but the NVidia CUDNN requires a special developer's account which I registered for but I need to wait a couple days to see if they approve it.
Status update: got it running on EC2 GPU after receiving developer account. I run the script though, and it is no faster running on a GPU than on a CPU. Probably has something to do with the implementation - it might need some work to make it run faster/parallelize it for the GPU. I'm not even sure if it's possible or what we'd need to do. Documentation online and Stack Overflow is sparse.
Also created an AMI on us-east for AWS with our repo and GPU installation set up so we can quickly get it running if needed: ami-a7561fcd
Tried running a new configuration with our "mod 10" toy dataset, this time with the following parameters:
Results are that I end up around 40% accuracy on the training data after 15 epochs, where each epoch takes around 2000s ~= 30 min, which means around 7.5 hours for the entire thing on my own machine.
Hey sorry, I was going to merge in stuff for you to run tonight, but I got caught up in getting new data on unoptimized code. I'm hoping this will give us a better bucket distribution... but it does make valgrind take substantially longer to run. So, I won't have results merged and pushed until tomorrow morning... but they will be pretty accurate at that time.
Okay. Any updates @arquinn ?
Updates from my end:
[num_functions, num_tokens_per_function]
[num_functions, num_result_buckets]
[num_functions, num_result_buckets]
[num_functions, 1]
resultsAlso, I tried running on an AWS c4.large instance. Comparison for per-epoch execution:
Again, GPU execution doesn't significantly improve from my MacBook Pro, but a faster CPU helps.
Yep, sorry. The master branch has a reasonable test set. Two other things are going on:
Okay, a few questions for y'all:
Turns out there's two copies of each of these files with various case variations (some chars are uppercase, others are lowercase). Judging by this link, we need to either delete the duplicate versions, or rename them. Are they the same files, or are they completely different? If so, can we rename them to be different names completely, not just case-distinguished filenames?
I recommend a token length of 100-200. Anything beyond that get substantially slower. :confused:
So I just made a push. Here is where we're at:
There are two different data_modules now. Both data_modules have a parameter that can be passed in the constructor "max_tokens" which specifies the number of tokens that the maximum function can have, defaulting to 200. We drop all other functions. The new data_module operates on function tokens that only include load/store blah blah. Its a subset of the original tokens.
Hope this helps! We are up to about a hundred thousand functions at this point...
okay, it's up and running right now. I really hope it works. It takes around 3000-5000s per epoch (50-80 min). I really hope that there's no errors involved because that would stink. I'm shooting for 40 epochs, which should take around 53 hours total. So... I REALLY HOPE THERE'S NO ERRORS.
In any case, the baseline gets around 50% on the first epoch. Do you guys have any analytics on what the data looks like as to why that might be?
For some reason each epoch slows down. I'm not sure if it's a resource allocation issue on Amazon VMs, or the algorithm is slowing down/consuming too much memory.. Hmm.
yep, check the presentation. There are some numbers on what the dataset looks like under the results section somewhere
On Dec 11, 2015, at 11:53 PM, Chris Baik notifications@github.com wrote:
okay, it's up and running right now. I really hope it works. It takes around 3000-5000s per epoch (50-80 min). I really hope that there's no errors involved because that would stink. I'm shooting for 40 epochs, which should take around 53 hours total. So... I REALLY HOPE THERE'S NO ERRORS.
In any case, the baseline gets around 50% on the first epoch. Do you guys have any analytics on what the data looks like as to why that might be?
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164110505.
so… I hate the be the bearer of bad news, But our data is currently totally screwed up. The c++ stuff messed up a bunch of our parsing scripts, and so I had to roll some stuff back. I’m down to only 4k functions, but I know that they’re actually real results (whereas I’m convinced the data under master is messed up). I don’t know how we didn’t catch this until now… my fault I think
pushed accurate sources to master. sorry all! yikes
On Dec 11, 2015, at 11:53 PM, Chris Baik notifications@github.com wrote:
okay, it's up and running right now. I really hope it works. It takes around 3000-5000s per epoch (50-80 min). I really hope that there's no errors involved because that would stink. I'm shooting for 40 epochs, which should take around 53 hours total. So... I REALLY HOPE THERE'S NO ERRORS.
In any case, the baseline gets around 50% on the first epoch. Do you guys have any analytics on what the data looks like as to why that might be?
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164110505.
I was speaking with Chris a couple of mins ago, and I guess he's asleep now. I just pulled the data and started the execution on my local.
Epoch: 1 Learning rate: 0.001 step: 0, accuracy: 0, distance: 2.4, xent: 46.9381
okay rerunning as well
Had a little issue with data loading. Re-running again. I got to end of 40 epochs, but with the small size of training data, that's NOT enough for the model to converge. So... Instead, I'm running 1000 epochs. I am not sure this will help but let's see what happens.
Yeah.. We might just wind up with an 'ideas' paper...
Sent from my iPhone
On Dec 12, 2015, at 8:58 AM, Chris Baik notifications@github.com wrote:
Had a little issue with data loading. Re-running again. I got to end of 40 epochs, but with the small size of training data, that's NOT enough for the model to converge. So... Instead, I'm running 1000 epochs. I am not sure this will help but let's see what happens.
— Reply to this email directly or view it on GitHub.
I think that's fine. I'll be going through the slides today and finish up the the dataset subsection in the evaluation section.
I can also run the thing locally with a different setting just give me the setting and I can start running that.
Vaspol
On Sat, Dec 12, 2015 at 9:21 AM, arquinn notifications@github.com wrote:
Yeah.. We might just wind up with an 'ideas' paper...
Sent from my iPhone
On Dec 12, 2015, at 8:58 AM, Chris Baik notifications@github.com wrote:
Had a little issue with data loading. Re-running again. I got to end of 40 epochs, but with the small size of training data, that's NOT enough for the model to converge. So... Instead, I'm running 1000 epochs. I am not sure this will help but let's see what happens.
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164158099.
Still getting the Value 0.0 error within Epoch 1 post Step 140, can't figure it out. @chrisjbaik any clue?
I have no problem running the stuff on an EC2 box. Wondering if it's a Mac OS X issue?
I'm currently on Epoch 272, training accuracy is around 90.5%. Might've been smart to include a validation set to have an unskewed number for the accuracy improvements, but oh well.
I get that those numbers are likely skewed, but 90% seems insanely good! Can you run the accuracy on the test set periodically, or does it have to be all the way after we've trained the network?
Sent from my iPhone
On Dec 12, 2015, at 2:34 PM, Chris Baik notifications@github.com wrote:
I have no problem running the stuff on an EC2 box. Wondering if it's a Mac OS X issue?
I'm currently on Epoch 272, training accuracy is around 90.5%. Might've been smart to include a validation set to have an unskewed number for the accuracy improvements, but oh well.
— Reply to this email directly or view it on GitHub.
On Epoch 370, it's at 91.2%. This is not run on the test set, I will do that after it's all trained; that's why I mentioned it'd be good to have a validation set to double-check per epoch. oops.
So we will be able to run with the test set after 1000 epochs are completed?
On Sat, Dec 12, 2015 at 4:34 PM, Chris Baik notifications@github.com wrote:
On Epoch 370, it's at 91.2%. This is not run on the test set, I will do that after it's all trained; that's why I mentioned it'd be good to have a validation set to double-check per epoch. oops.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164195432.
Um, more like it'll run automatically when I run 1000 epochs. Perhaps I should have just stopped it at 350. It is actually faster at this point for me to restart and run it to 350 epochs. Maybe I will do this, lol.
On Sat, Dec 12, 2015 at 4:45 PM Vaspol Ruamviboonsuk < notifications@github.com> wrote:
So we will be able to run with the test set after 1000 epochs are completed?
On Sat, Dec 12, 2015 at 4:34 PM, Chris Baik notifications@github.com wrote:
On Epoch 370, it's at 91.2%. This is not run on the test set, I will do that after it's all trained; that's why I mentioned it'd be good to have a validation set to double-check per epoch. oops.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164195432.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164196018.
- Chris
I will launch a second machine instead and do both. :smile:
I've been trying to run on my local, but it continues giving the error on the 140th step, prolly some issue running on mac, coz its the same updated code and I don't see what's being weird.
Chris, if you're doing that may be you can add a validation set as well, if that's not too complicated for you to do.
On Sat, Dec 12, 2015 at 4:57 PM, Zaina Hamid notifications@github.com wrote:
I've been trying to run on my local, but it continues giving the error on the 140th step, prolly some issue running on mac, coz its the same updated code and I don't see what's being weird.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164196630.
Didn't get a chance to add the validation.. running again though! :)
Sigh. Had a little bug at the end of it. I re-tested with 1 epoch and should be working now. Also added a validation set. Will re-run one last time (hopefully)
For some weird reason it died by itself with just a notice that says "Killed" on Epoch 224. Either way, results don't look too promising. I get Training Accuracy at 87.6% and Validation Accuracy at the same point is 29%.
I reran it again, but we should start coming up with a contingency plan for how to present and assess the evaluation.
Can we run with a low epoch that we know will finish like 200? I know the accuracy will suck but at least to get the number out there?
For the presentation, we can present something like we were trying out an idea, but it didn't work out like what we expected. We can probably try to explain why it didn't work out too.
On Sun, Dec 13, 2015 at 3:49 PM, Chris Baik notifications@github.com wrote:
For some weird reason it died by itself with just a notice that says "Killed" on Epoch 224. Either way, results don't look too promising. I get Training Accuracy at 87.6% and Validation Accuracy at the same point is 29%.
I reran it again, but we should start coming up with a contingency plan for how to present and assess the evaluation.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164296797.
Well, I think we still need to tease out how it failed. Metrics like average distance etc. will help to explain our issue, especially since the buckets are relatively close together. Potentially 29% isn't actually THAT bad if you look at how far apart the buckets are... Not sure.
Sent from my iPhone
On Dec 13, 2015, at 4:10 PM, Vaspol Ruamviboonsuk notifications@github.com wrote:
Can we run with a low epoch that we know will finish like 200? I know the accuracy will suck but at least to get the number out there?
For the presentation, we can present something like we were trying out an idea, but it didn't work out like what we expected. We can probably try to explain why it didn't work out too.
On Sun, Dec 13, 2015 at 3:49 PM, Chris Baik notifications@github.com wrote:
For some weird reason it died by itself with just a notice that says "Killed" on Epoch 224. Either way, results don't look too promising. I get Training Accuracy at 87.6% and Validation Accuracy at the same point is 29%.
I reran it again, but we should start coming up with a contingency plan for how to present and assess the evaluation.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164296797.
— Reply to this email directly or view it on GitHub.
Okay I'll get back to y'all later tonight with some of the data and results. Sorry about the troubles.
On Sun, Dec 13, 2015 at 4:34 PM arquinn notifications@github.com wrote:
Well, I think we still need to tease out how it failed. Metrics like average distance etc. will help to explain our issue, especially since the buckets are relatively close together. Potentially 29% isn't actually THAT bad if you look at how far apart the buckets are... Not sure.
Sent from my iPhone
On Dec 13, 2015, at 4:10 PM, Vaspol Ruamviboonsuk < notifications@github.com> wrote:
Can we run with a low epoch that we know will finish like 200? I know the accuracy will suck but at least to get the number out there?
For the presentation, we can present something like we were trying out an idea, but it didn't work out like what we expected. We can probably try to explain why it didn't work out too.
On Sun, Dec 13, 2015 at 3:49 PM, Chris Baik notifications@github.com wrote:
For some weird reason it died by itself with just a notice that says "Killed" on Epoch 224. Either way, results don't look too promising. I get Training Accuracy at 87.6% and Validation Accuracy at the same point is 29%.
I reran it again, but we should start coming up with a contingency plan for how to present and assess the evaluation.
— Reply to this email directly or view it on GitHub <https://github.com/paivaspol/EECS583/issues/6#issuecomment-164296797 .
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub https://github.com/paivaspol/EECS583/issues/6#issuecomment-164300109.
- Chris
@paivaspol yeah, I'm rerunning with 200 epochs now. Sorry, cutting it tight... I can't get the data at the moment. It kept on dying at around 223 epochs for an unknown reason...
I guess it was dying because of memory overload as per this link. sigh.
Final results after 200 epoch execution:
Test accuracy: 27.3% Test distance: 2.646 Test cross-entropy per observation: 3.816 (this actually doesn't mean much)
https://dl.dropboxusercontent.com/u/20010067/nn_results.tar.gz
Here's the results for the data:
@paivaspol @zainahamid @arquinn
Current Status
Okay, so there are a few different changes we made to the neural network to get it to run better with our toy example. At first, we had been getting results of 0.1 accuracy consistently with no change in cross-entropy, which is bad because 0.1 accuracy for 10 result buckets means that you're basically performing at the same capability as randomly selecting a result.
[batch_size, words_per_function]
matrix for every iteration, but doing this random generation per iteration was adding a lot of noise. Instead of this, we selected ourbatch_size
as 16, then created a training dataset of 100 functions. Every iteration, we randomly select a new sample batch of 16 from the training set. This is actually the appropriate way to do it - you want to run multiple iterations over different samples from the training set, but what I was mistakenly doing was recreating a new training set every time, making it impossible to train on it. This SO question helped with understanding what I needed.Items to consider moving forward
Additional Reading on Neural Networks
http://neuralnetworksanddeeplearning.com/ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ http://deeplearning.net/tutorial/lstm.html