Closed AuroraRAS closed 7 years ago
@merceyz Thoughts?
[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (272173 ms total)
[ PASSED ] 144 tests.
[ FAILED ] 15 tests, listed below:
[ FAILED ] testupdateweights.backprop_instance3_smaller2
[ FAILED ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh
[ FAILED ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu
[ FAILED ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear
[ FAILED ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
[ FAILED ] testsimpleconvolvenet.imagesize1_n2_2layers_biased
[ FAILED ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3
[ FAILED ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6
[ FAILED ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18
[ FAILED ] testlogicaloperators.Convolve_2layers_relu_Xor
[ FAILED ] testbackward.checknumerically_imagesize5_filter3_relu
[ FAILED ] testsinglebatch.imagesize5_filtersize3_batchsize2
[ FAILED ] testsinglebatch.imagesize28
[ FAILED ] testsinglebatch.imagesize28_filtersize5
[ FAILED ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters
15 FAILED TESTS
YOU HAVE 2 DISABLED TESTS
I don't suppose it's possible to have it say why it failed?
deepcl_unittests > result.txt
cat result.txt |grep Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:78: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:276: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:356: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:432: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:494: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:767: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:884: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:1057: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:1061: Failure
/home/ml/Projects/DeepCL/test/testlogicaloperators.cpp:290: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
some part for test resluts:
loss, E, 0.493295
loss, E, 0.493295
accuracy: 6/6 100%
accuracy: 6/6
loss, E, 0.493295
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:767: Failure
Expected: (0.00001f) >= (loss), actual: 1e-05 vs 0.493295
[ FAILED ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6 (10248 ms)
The reason it seems to fail on "all of them" is that the loss isn't what it expects, however the number of correct predictions is as it expects.
I'm guessing it's that the new kernels doesn't give the same loss as the other ones.
Why that is I don't know, It's probably safe to ignore.
Could you post the OpenCL details of your device?
R9 270x
Number of platforms 2
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 17.1.7
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 2.0 pocl 0.14, LLVM 4.0.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix POCL
Platform Name Clover
Number of devices 1
Device Name AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 17.1.7
Driver Version 17.1.7
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Max compute units 20
Max clock frequency 1100MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Compiler Available Yes
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2147483648 (2GiB)
Error Correction support No
Max memory allocation 1503238553 (1.4GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max constant buffer size 1503238553 (1.4GiB)
Max number of constant args 16
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64
Platform Name Portable Computing Language
Number of devices 1
Device Name pthread-Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
Device Vendor GenuineIntel
Device Vendor ID 0x8086
Device Version OpenCL 2.0 pocl HSTR: pthread-x86_64-unknown-linux-gnu-haswell
Driver Version 0.14
Device OpenCL C Version OpenCL C 2.0
Device Type CPU, Default
Device Available Yes
Device Profile FULL_PROFILE
Max compute units 8
Max clock frequency 3700MHz
Device Partition (core)
Max number of sub-devices 8
Supported partition types equally, by counts
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Compiler Available Yes
Linker Available Yes
Preferred work group size multiple 8
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 18887524352 (17.59GiB)
Error Correction support No
Max memory allocation 18887524352 (17.59GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 0
Preferred total size of global vars 0
Global Memory cache type Read/Write
Global Memory cache size 32768 (32KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 1180470272 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 32768x32768 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Global
Local memory size 18887524352 (17.59GiB)
Max constant buffer size 18887524352 (17.59GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 16384 (16KiB)
Max size 262144 (256KiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir cl_khr_int64 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA]
clCreateContext(NULL, ...) [default] Success [MESA]
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Clover
Device Name AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Clover
Device Name AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Clover
Device Name AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
@merceyz Generally speaking, changing the kernels shouldnt change the underlying maths equations, and results should be near identical. If they're not near identical, it often/usually indicates a bug, that will manifest later, often as poor convergence/accuracy.
I agree, sadly I don't know what's wrong. I looked into it but couldn't figure it out. Unless someone else can look into it I suggest reverting the changes
Cool. Can we maybe push it to a branch like dev
? Then, you can carry on using the dev
branch, new people who want the cutting-edge can use dev
too, and sometime, someone might figure out how to fix the R9 270X issue (or show that it's actually probably working already perhaps).
I have a R9 280x and i'm also not passing the tests, I disabled the new kernels and still there were some that I didn't pass, probably because of other changes. I've used the new kernels in my local branch for a while without any issues.
I don't have cog so if you don't mind running it to see if the OpenCL code is the correct one that would be great
But that sounds like a plan, branch off from current master then revert them if you'd like.
But that sounds like a plan, branch off from current master then revert them if you'd like.
Yes. Please push these changes to a dev
branch. Or modify the tests so they pass. Either ok.
I made a branch based on master but can't revert the commits without making a PR, could you revert them directly?
could you revert them directly?
When you say 'revert them directly', you mean 'force push'? Not really keen on force pushing to master :) . Anyone who's already pulled/cloned from master will get an extra commit.
(and so I think that creating a PR, with a revert commit on it, and merging that, is an excellent approach)
(reverted in a80e611c390d14b6bc7f7265d8f8fd735135cb57 and cbbf51f3ef515ee8e717dc732f48203645d03b89 )
(in hindsight, I should have reverted in the reverse order to they were applied. anyway, I shall know for next time :) )
look like a failed test still exist in current master branch(cbbf51f)
if you need any details please tell me.
[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (120638 ms total)
[ PASSED ] 158 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] testupdateweights.backprop_instance3_smaller2
1 FAILED TEST
YOU HAVE 2 DISABLED TESTS
some details:
[ RUN ] testupdateweights.backprop_instance3_smaller2
Using Mesa , OpenCL platform: Clover
Using OpenCL device: AMD PITCAIRN (DRM 2.50.0 / 4.12.9-300.fc26.x86_64, LLVM 4.0.0)
numweights: 36
Didnt initialize clBLAS
unknown file: Failure
Unknown C++ exception thrown in the test body.
[ FAILED ] testupdateweights.backprop_instance3_smaller2 (511 ms)
[----------] 23 tests from testupdateweights (19379 ms total)
The second failure is marked EXCLUDED_
. In theory, it shouldnt be running in fact. The first failure is odd, seems like something fairly fundamentally broken in that test case?
@MiniLight How are you running the test cases? By the way, preference to use https://gist.github.com to dump the entire output of the tests, including command and so on. Then its easier to see what is happening.
I'm reboot my computer just now, and the second failure disappeared. I think it's my environment problem.
but first failure still exist, and it's not in ad1ab61
@hughperkins I tried paste test results to Gist, but I can not paste it complete. maybe the file is too long?
this my test script:
#/bin/sh
cd DeepCL/
rm dist/ build/ -rf
git fetch origin
git reset --hard $1
git merge origin/clover-compatibility
mkdir build
cd build
ccmake ..
make -j7 install
cd ../dist
. bin/activate.sh
deepcl_unittests > ../../result_$1.txt
and command
./deeptest.sh ad1ab61
./deeptest.sh cbbf51f
Hmmm, I've never seen gist fail, but ok.
what is origin/clover-compatibility
?
what is origin/clover-compatibility?
Assuming it's this branch https://github.com/hughperkins/DeepCL/tree/clover-compatibility
yes, i found it in there
https://github.com/hughperkins/DeepCL#hardwaredriver-specific-issues
Ah. Clover is not supported. Do you have some way of running DeepCL, without using Clover?
(update: I've just now tweaked the README to point out that Clover is not actually supported)
I have a "Portable Computing Language" device in clinfo
but I don't know it's available or not, my E3-1230 CPU is without graphics core.
And I don't know how to choose a CL device at deepcl_unittests
, it looks like it is automatically selected the "Clover" device.
Clover has a large users base, and looks worked well in ad1ab61. I hope it can go on
obtained 99.5% test accuracy on MNIST ad1ab61 Clover Benchmarking:
after epoch 20 132906 ms
training loss: 752567
train accuracy: 6110/60000 10.1833%
test accuracy: 9949/10000 99.49%
after tests 2154 ms
record epoch=20
wrote weights to file, filesize 173KB
https://github.com/hughperkins/DeepCL/blob/master/doc/Benchmarking.md
int idx
is changed and no any change when call instanceSpecific
. I think this is unusual, but I can't make sure.
by git diff ad1ab61 cbbf51f
command i found out this:
diff --git a/src/conv/BackpropWeights.cpp b/src/conv/BackpropWeights.cpp
index 691a133..4e0cc7e 100644
--- a/src/conv/BackpropWeights.cpp
+++ b/src/conv/BackpropWeights.cpp
@@ -65,18 +65,15 @@ STATIC BackpropWeights *BackpropWeights::instanceSpecific(int idx, EasyCL *cl, L
return new BackpropWeightsAuto(cl, layerDimensions);
}
if(idx == 0) {
- return new BackpropWeightsCpu(cl, layerDimensions);
- }
- if(idx == 1) {
return new BackpropWeightsNaive(cl, layerDimensions);
}
- if(idx == 2) {
+ if(idx == 1) {
return new BackpropWeightsScratch(cl, layerDimensions);
}
- if(idx == 3) {
+ if(idx == 2) {
return new BackpropWeightsScratchLarge(cl, layerDimensions);
}
- if(idx == 4) {
+ if(idx == 3) {
return new BackpropWeightsIm2Col(cl, layerDimensions);
}
throw std::runtime_error("BackpropWeights::instanceSpecific doesnt handle idx " + toString(idx));
Good spot. Fixed in 3edb3c6 . Hows it look now?
[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (124509 ms total)
[ PASSED ] 159 tests.
Great! :)
obtained 99.5% test accuracy on MNIST
Your train accuracy is 10% so something isn't right
@merceyz Do you mean, on commit 3edb3c6 ?
@merceyz I think it's OK. I found another result, maybe same reason: https://github.com/hughperkins/DeepCL/issues/66#issuecomment-223230826
@hughperkins From https://github.com/hughperkins/DeepCL/issues/133#issuecomment-326956374 he states
obtained 99.5% test accuracy on MNIST ad1ab61 Clover Benchmarking:
@MiniLight He was implementing a new kernel (the one that started this issue) and there was something wrong with it. After he fixed it the train accuracy went up to 97%
@merceyz Where are you seeing the '10%' bit?
oh, for ad1ab61 . I see. I dont think I need to consider that right, because that is the pre-revert commit?
Odd that the training accuracy is junk, but the test accuracy is good...
because that is the pre-revert commit?
It's before the kernel changes were added at all. But it's in Clover so who knows
ah, ok.
@merceyz Changing subjet a bit, for the tests prior to the revert... I didnt look at the failing results carefully. If it's something like a loss is different by 1e-6 or something like that, it could be sufficient just to tweak the tolerance
on the test. Do you have an example of the output for one of the failing tests?
In https://github.com/hughperkins/DeepCL/issues/133#issuecomment-325557797 you can see the values of one of the tests
Oh. 0.49 vs 1e-5 is quite a big difference :(
Though that was on Clover so might need to test it somewhere else
Yes. Clover is out of scope. I mean, it's nice if Clover works, but I'm not officially supporting it. (Last time I looked, it was missing loads of things, like eg tanh
, though that might have changed since then)
Did the tests all pass for you, Chris?
3edb3c6857cc54f58de1e6450c7e1e6ceab39892 All passed b2562203cd72112b18e9fa511a4f0d3506375200 3 failed
[----------] Global test environment tear-down
[==========] 158 tests from 29 test cases ran. (108265 ms total)
[ PASSED ] 155 tests.
[ FAILED ] 3 tests, listed below:
[ FAILED ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
[ FAILED ] testsinglebatch.imagesize5_filtersize3_batchsize2
[ FAILED ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters
3 FAILED TESTS
DeepCL\test\testsinglebatch.cpp(221): error: Value of: allOk
Actual: false
Expected: true
[ FAILED ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters (4796 ms)
DeepCL\test\testsinglebatch.cpp(221): error: Value of: allOk
Actual: false
Expected: true
[ FAILED ] testsinglebatch.imagesize5_filtersize3_batchsize2 (4472 ms)
DeepCL\test\testsimpleconvolvenet.cpp(494): error: Expected: (0.0001f) >= (loss), actual: 0.0001 vs 0.000213079
[ FAILED ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased (2599 ms)
We can ignore the EXCLUDED_
one. The other one, we can probably just tweak the threshold/tolerance from 0.0001 to 0.0003, in this line https://github.com/hughperkins/DeepCL/blob/master/test/testsimpleconvolvenet.cpp#L494
I'm not sure why we are seeing EXCLUDED_
tests. They should be excluded by virtue of .. oh, hmmm.... well, in theory this line https://github.com/hughperkins/DeepCL/blob/master/test/gtest_main.cpp#L46 , but there's no EXCLUDED
there. Perhaps we should either 1. add EXCLUDED
to this line, or 2. change the test name from EXCLUDED_something
to DISABLED_something
, which is apparently the official naming convention https://stackoverflow.com/questions/7208070/googletest-how-to-skip-a-test/7208119#7208119
(basically, it's ok-ish to tweak tests so they pass, based on our best judgement. We dont want other people to see failing tests, and have to somehow make a judgement on whether they can ignore them or not)
ad1ab61
result:
b256220
result: