sanyam5 / arc-pytorch

The first public PyTorch implementation of Attentive Recurrent Comparators
MIT License
148 stars 33 forks source link

other python 2.7 #2

Open phobrain opened 7 years ago

phobrain commented 7 years ago

Lots of similar changes as in download issue #1 I closed with fix, plus this pattern:

-- train.py

< def get_pct_accuracy(pred: Variable, target) :

def get_pct_accuracy(pred, target) :

phobrain commented 7 years ago

train.py ~ line 54 (doing this stuff for the 1st time in python)

# make directory for storing models.
models_path = os.path.join("saved_models", opt.name)
try:
    os.stat(models_path)
except OSError:
    os.makedirs(models_path)
phobrain commented 7 years ago

models.py, instances of super() need args I haven't figured out yet. Current attempt:

super(type(ArcBinaryClassifier), self).__init__()

TypeError: super(type, obj): obj must be an instance or subtype of type

phobrain commented 7 years ago

Bit the bullet and installed the self-reviling

https://pypi.python.org/pypi/magicsuper/

and am chugging away nicely on an old Macbook:

Iteration: 170 Train: Acc=46%, Loss=0.693698883057 Validation: Acc=54%, Loss=0.692266881466 Iteration: 5390 Train: Acc=72%, Loss=0.537120938301 Validation: Acc=79%, Loss=0.423486799002 Significantly improved validation loss from 0.435477942228 --> 0.423486799002. Saving... Iteration: 15550 Train: Acc=81%, Loss=0.502185702324 Validation: Acc=93%, Loss=0.185158133507 Significantly improved validation loss from 0.210910066962 --> 0.185158133507. Saving...

(I reinstalled pytorch after installing torch, to handle some problem.) (I don't suppose there's a way to multithread it? Tensorflow on keras/inceptionv3 makes the fan run and goes over 300% of 2 cores, while this is getting 110% with no fan, so I wonder if there might be some flag that could be added. Don't force me to buy a heater! ;-] Maybe multithreading could be done for a ConvARC [hint ;-], if ARCs are inherently sequential?)

sanyam5 commented 7 years ago

Hey @phobrain, sorry again. The syntax errors you are getting are all artifacts of Python3 features.

Good to know that it started training! You reached a pretty good accuracy. I am curious as to what hyper parameters you used.

While it is true that parts of ARC are inherently sequential (you seen the next glimpse based on the information gathered from the previous glimpse) there is definitely some parallelization possible (in matrix multiplication, etc). And I may be wrong but I thought that PyTorch did that automatically under the hood. Not sure why it is stuck at 110%.

phobrain commented 7 years ago

With these bugs filed, 2.7'ers have hacks at least. I don't know what speed I'm sacrificing with that super(). I just ran the default params, don't see offhand where to set them. Will investigate thread count some more now you've given me hope; matrix ops was my hope for parallel.

Iteration: 44660 Train: Acc=85%, Loss=0.349155157804 Validation: Acc=89%, Loss=0.245932474732

Algorithmically, it would be interesting to experiment with multiple training threads asynchronously updating shared weights, ideally one per GPU.

phobrain commented 7 years ago

I'm waiting to see if it ever stops, or until I have adapted a version to try on 299x299.

Iteration: 59290 Train: Acc=81%, Loss=0.414705693722 Validation: Acc=89%, Loss=0.253573656082 Iteration: 59300 Train: Acc=87%, Loss=0.329242378473 Validation: Acc=94%, Loss=0.160938769579

phobrain commented 7 years ago

It looks like only explicit multithreading is supported in pytorch/torch - at least I couldn't find any setting/flag to turn it on for low-level ops, instead finding examples of how to code parallel. Maybe at some point I'll try to translate it to keras if tensorflow is better-optimized, tho stuck now with an adapted keras siamese/inceptionv3 net, with a problem that I don't understand.

Iteration: 78620 Train: Acc=86%, Loss=0.37217849493 Validation: Acc=90%, Loss=0.218237221241

Iteration: 79430 Train: Acc=84%, Loss=0.348631471395 Validation: Acc=90%, Loss=0.240455701947

+11 hours:

Iteration: 207930 Train: Acc=92%, Loss=0.22611771524 Validation: Acc=92%, Loss=0.208926916122 Iteration: 207940 Train: Acc=87%, Loss=0.318686276674 Validation: Acc=96%, Loss=0.113623209298

Later..

Iteration: 340620 Train: Acc=92%, Loss=0.187863066792 Validation: Acc=92%, Loss=0.185851037502 Iteration: 340630 Train: Acc=88%, Loss=0.23383910954 Validation: Acc=92%, Loss=0.183825999498

sanyam5 commented 7 years ago

I have not implemented early stopping yet.

phobrain commented 7 years ago

Realizing it hadn't moved much in a day, I killed it after the above, since it finally started some continuous low-level fan action.

Could you provide a simple script that would load weights and tell if two images were 'the same' or to what degree?