Open phobrain opened 7 years ago
train.py ~ line 54 (doing this stuff for the 1st time in python)
# make directory for storing models.
models_path = os.path.join("saved_models", opt.name)
try:
os.stat(models_path)
except OSError:
os.makedirs(models_path)
models.py, instances of super() need args I haven't figured out yet. Current attempt:
super(type(ArcBinaryClassifier), self).__init__()
TypeError: super(type, obj): obj must be an instance or subtype of type
Bit the bullet and installed the self-reviling
https://pypi.python.org/pypi/magicsuper/
and am chugging away nicely on an old Macbook:
Iteration: 170 Train: Acc=46%, Loss=0.693698883057 Validation: Acc=54%, Loss=0.692266881466 Iteration: 5390 Train: Acc=72%, Loss=0.537120938301 Validation: Acc=79%, Loss=0.423486799002 Significantly improved validation loss from 0.435477942228 --> 0.423486799002. Saving... Iteration: 15550 Train: Acc=81%, Loss=0.502185702324 Validation: Acc=93%, Loss=0.185158133507 Significantly improved validation loss from 0.210910066962 --> 0.185158133507. Saving...
(I reinstalled pytorch after installing torch, to handle some problem.) (I don't suppose there's a way to multithread it? Tensorflow on keras/inceptionv3 makes the fan run and goes over 300% of 2 cores, while this is getting 110% with no fan, so I wonder if there might be some flag that could be added. Don't force me to buy a heater! ;-] Maybe multithreading could be done for a ConvARC [hint ;-], if ARCs are inherently sequential?)
Hey @phobrain, sorry again. The syntax errors you are getting are all artifacts of Python3 features.
Good to know that it started training! You reached a pretty good accuracy. I am curious as to what hyper parameters you used.
While it is true that parts of ARC are inherently sequential (you seen the next glimpse based on the information gathered from the previous glimpse) there is definitely some parallelization possible (in matrix multiplication, etc). And I may be wrong but I thought that PyTorch did that automatically under the hood. Not sure why it is stuck at 110%.
With these bugs filed, 2.7'ers have hacks at least. I don't know what speed I'm sacrificing with that super(). I just ran the default params, don't see offhand where to set them. Will investigate thread count some more now you've given me hope; matrix ops was my hope for parallel.
Iteration: 44660 Train: Acc=85%, Loss=0.349155157804 Validation: Acc=89%, Loss=0.245932474732
Algorithmically, it would be interesting to experiment with multiple training threads asynchronously updating shared weights, ideally one per GPU.
I'm waiting to see if it ever stops, or until I have adapted a version to try on 299x299.
Iteration: 59290 Train: Acc=81%, Loss=0.414705693722 Validation: Acc=89%, Loss=0.253573656082 Iteration: 59300 Train: Acc=87%, Loss=0.329242378473 Validation: Acc=94%, Loss=0.160938769579
It looks like only explicit multithreading is supported in pytorch/torch - at least I couldn't find any setting/flag to turn it on for low-level ops, instead finding examples of how to code parallel. Maybe at some point I'll try to translate it to keras if tensorflow is better-optimized, tho stuck now with an adapted keras siamese/inceptionv3 net, with a problem that I don't understand.
Iteration: 78620 Train: Acc=86%, Loss=0.37217849493 Validation: Acc=90%, Loss=0.218237221241
Iteration: 79430 Train: Acc=84%, Loss=0.348631471395 Validation: Acc=90%, Loss=0.240455701947
+11 hours:
Iteration: 207930 Train: Acc=92%, Loss=0.22611771524 Validation: Acc=92%, Loss=0.208926916122 Iteration: 207940 Train: Acc=87%, Loss=0.318686276674 Validation: Acc=96%, Loss=0.113623209298
Later..
Iteration: 340620 Train: Acc=92%, Loss=0.187863066792 Validation: Acc=92%, Loss=0.185851037502 Iteration: 340630 Train: Acc=88%, Loss=0.23383910954 Validation: Acc=92%, Loss=0.183825999498
I have not implemented early stopping yet.
Realizing it hadn't moved much in a day, I killed it after the above, since it finally started some continuous low-level fan action.
Could you provide a simple script that would load weights and tell if two images were 'the same' or to what degree?
Lots of similar changes as in download issue #1 I closed with fix, plus this pattern:
-- train.py
< def get_pct_accuracy(pred: Variable, target) :