hughperkins / DeepCL

OpenCL library to train deep convolutional neural networks
Mozilla Public License 2.0
866 stars 200 forks source link

can passed all test in #ad1ab61, but not now (#b256220) #133

Closed AuroraRAS closed 7 years ago

AuroraRAS commented 7 years ago
clinfo

Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 17.1.7
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 2.0 pocl 0.14, LLVM 4.0.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

ad1ab61

git reset --hard ad1ab61
git merge origin/clover-compatibility
 ...
deepcl_unittests

result:

[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (114973 ms total)
[  PASSED  ] 159 tests.

  YOU HAVE 2 DISABLED TESTS

b256220

git reset --hard origin/master
git merge origin/clover-compatibility
 ...
deepcl_unittests

result:

[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (253455 ms total)
[  PASSED  ] 145 tests.
[  FAILED  ] 14 tests, listed below:
[  FAILED  ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh
[  FAILED  ] testsimpleconvolvenet.imagesize1_2planes_filtersize1
[  FAILED  ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
[  FAILED  ] testsimpleconvolvenet.imagesize1_n2_2layers_biased
[  FAILED  ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3
[  FAILED  ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6
[  FAILED  ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6
[  FAILED  ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18
[  FAILED  ] testlogicaloperators.Convolve_2layers_relu_Xor
[  FAILED  ] testbackward.checknumerically_imagesize5_filter3_relu
[  FAILED  ] testsinglebatch.imagesize5_filtersize3_batchsize2
[  FAILED  ] testsinglebatch.imagesize28
[  FAILED  ] testsinglebatch.imagesize28_filtersize5
[  FAILED  ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters

14 FAILED TESTS
  YOU HAVE 2 DISABLED TESTS
hughperkins commented 7 years ago

@merceyz Thoughts?

AuroraRAS commented 7 years ago

da3f96b

[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (272173 ms total)
[  PASSED  ] 144 tests.
[  FAILED  ] 15 tests, listed below:
[  FAILED  ] testupdateweights.backprop_instance3_smaller2
[  FAILED  ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh
[  FAILED  ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu
[  FAILED  ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear
[  FAILED  ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
[  FAILED  ] testsimpleconvolvenet.imagesize1_n2_2layers_biased
[  FAILED  ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3
[  FAILED  ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6
[  FAILED  ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18
[  FAILED  ] testlogicaloperators.Convolve_2layers_relu_Xor
[  FAILED  ] testbackward.checknumerically_imagesize5_filter3_relu
[  FAILED  ] testsinglebatch.imagesize5_filtersize3_batchsize2
[  FAILED  ] testsinglebatch.imagesize28
[  FAILED  ] testsinglebatch.imagesize28_filtersize5
[  FAILED  ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters

15 FAILED TESTS
  YOU HAVE 2 DISABLED TESTS
merceyz commented 7 years ago

I don't suppose it's possible to have it say why it failed?

AuroraRAS commented 7 years ago
deepcl_unittests > result.txt
cat result.txt |grep Failure

/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:78: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:276: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:356: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:432: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:494: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:767: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:884: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:1057: Failure
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:1061: Failure
/home/ml/Projects/DeepCL/test/testlogicaloperators.cpp:290: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:580: Failure
/home/ml/Projects/DeepCL/test/testbackward.cpp:581: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure
/home/ml/Projects/DeepCL/test/testsinglebatch.cpp:221: Failure

some part for test resluts:

loss, E, 0.493295
loss, E, 0.493295
 accuracy: 6/6 100%
accuracy: 6/6
loss, E, 0.493295
/home/ml/Projects/DeepCL/test/testsimpleconvolvenet.cpp:767: Failure
Expected: (0.00001f) >= (loss), actual: 1e-05 vs 0.493295
[  FAILED  ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6 (10248 ms)
merceyz commented 7 years ago

The reason it seems to fail on "all of them" is that the loss isn't what it expects, however the number of correct predictions is as it expects.

I'm guessing it's that the new kernels doesn't give the same loss as the other ones.

Why that is I don't know, It's probably safe to ignore.

Could you post the OpenCL details of your device?

AuroraRAS commented 7 years ago

R9 270x

Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 17.1.7
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 2.0 pocl 0.14, LLVM 4.0.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 17.1.7
  Driver Version                                  17.1.7
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               20
  Max clock frequency                             1100MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Compiler Available                              Yes
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Error Correction support                        No
  Max memory allocation                           1503238553 (1.4GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        1503238553 (1.4GiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.0 pocl HSTR: pthread-x86_64-unknown-linux-gnu-haswell
  Driver Version                                  0.14
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     CPU, Default
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               8
  Max clock frequency                             3700MHz
  Device Partition                                (core)
    Max number of sub-devices                     8
    Supported partition types                     equally, by counts
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              18887524352 (17.59GiB)
  Error Correction support                        No
  Max memory allocation                           18887524352 (17.59GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768 (32KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            1180470272 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               18887524352 (17.59GiB)
  Max constant buffer size                        18887524352 (17.59GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                16384 (16KiB)
    Max size                                      262144 (256KiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir cl_khr_int64 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD PITCAIRN (DRM 2.50.0 / 4.12.8-300.fc26.x86_64, LLVM 4.0.0)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1
hughperkins commented 7 years ago

@merceyz Generally speaking, changing the kernels shouldnt change the underlying maths equations, and results should be near identical. If they're not near identical, it often/usually indicates a bug, that will manifest later, often as poor convergence/accuracy.

merceyz commented 7 years ago

I agree, sadly I don't know what's wrong. I looked into it but couldn't figure it out. Unless someone else can look into it I suggest reverting the changes

hughperkins commented 7 years ago

Cool. Can we maybe push it to a branch like dev? Then, you can carry on using the dev branch, new people who want the cutting-edge can use dev too, and sometime, someone might figure out how to fix the R9 270X issue (or show that it's actually probably working already perhaps).

merceyz commented 7 years ago

I have a R9 280x and i'm also not passing the tests, I disabled the new kernels and still there were some that I didn't pass, probably because of other changes. I've used the new kernels in my local branch for a while without any issues.

I don't have cog so if you don't mind running it to see if the OpenCL code is the correct one that would be great

But that sounds like a plan, branch off from current master then revert them if you'd like.

hughperkins commented 7 years ago

But that sounds like a plan, branch off from current master then revert them if you'd like.

Yes. Please push these changes to a dev branch. Or modify the tests so they pass. Either ok.

merceyz commented 7 years ago

I made a branch based on master but can't revert the commits without making a PR, could you revert them directly?

hughperkins commented 7 years ago

could you revert them directly?

When you say 'revert them directly', you mean 'force push'? Not really keen on force pushing to master :) . Anyone who's already pulled/cloned from master will get an extra commit.

hughperkins commented 7 years ago

(and so I think that creating a PR, with a revert commit on it, and merging that, is an excellent approach)

hughperkins commented 7 years ago

(reverted in a80e611c390d14b6bc7f7265d8f8fd735135cb57 and cbbf51f3ef515ee8e717dc732f48203645d03b89 )

hughperkins commented 7 years ago

(in hindsight, I should have reverted in the reverse order to they were applied. anyway, I shall know for next time :) )

AuroraRAS commented 7 years ago

look like a failed test still exist in current master branch(cbbf51f)

if you need any details please tell me.

[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (120638 ms total)
[  PASSED  ] 158 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] testupdateweights.backprop_instance3_smaller2

 1 FAILED TEST
  YOU HAVE 2 DISABLED TESTS

some details:

[ RUN      ] testupdateweights.backprop_instance3_smaller2
Using Mesa , OpenCL platform: Clover
Using OpenCL device: AMD PITCAIRN (DRM 2.50.0 / 4.12.9-300.fc26.x86_64, LLVM 4.0.0)
numweights: 36
Didnt initialize clBLAS
unknown file: Failure
Unknown C++ exception thrown in the test body.
[  FAILED  ] testupdateweights.backprop_instance3_smaller2 (511 ms)
[----------] 23 tests from testupdateweights (19379 ms total)
hughperkins commented 7 years ago

The second failure is marked EXCLUDED_. In theory, it shouldnt be running in fact. The first failure is odd, seems like something fairly fundamentally broken in that test case?

hughperkins commented 7 years ago

@MiniLight How are you running the test cases? By the way, preference to use https://gist.github.com to dump the entire output of the tests, including command and so on. Then its easier to see what is happening.

AuroraRAS commented 7 years ago

I'm reboot my computer just now, and the second failure disappeared. I think it's my environment problem.

but first failure still exist, and it's not in ad1ab61

AuroraRAS commented 7 years ago

@hughperkins I tried paste test results to Gist, but I can not paste it complete. maybe the file is too long?

this my test script:

#/bin/sh

cd DeepCL/
rm dist/ build/ -rf
git fetch origin
git reset --hard $1
git merge origin/clover-compatibility
mkdir build
cd build
ccmake ..
make -j7 install
cd ../dist
. bin/activate.sh
deepcl_unittests > ../../result_$1.txt

and command

./deeptest.sh ad1ab61
./deeptest.sh cbbf51f
hughperkins commented 7 years ago

Hmmm, I've never seen gist fail, but ok.

what is origin/clover-compatibility?

merceyz commented 7 years ago

what is origin/clover-compatibility?

Assuming it's this branch https://github.com/hughperkins/DeepCL/tree/clover-compatibility

AuroraRAS commented 7 years ago

yes, i found it in there

https://github.com/hughperkins/DeepCL#hardwaredriver-specific-issues

hughperkins commented 7 years ago

Ah. Clover is not supported. Do you have some way of running DeepCL, without using Clover?

hughperkins commented 7 years ago

(update: I've just now tweaked the README to point out that Clover is not actually supported)

AuroraRAS commented 7 years ago

I have a "Portable Computing Language" device in clinfo but I don't know it's available or not, my E3-1230 CPU is without graphics core.

And I don't know how to choose a CL device at deepcl_unittests, it looks like it is automatically selected the "Clover" device.

Clover has a large users base, and looks worked well in ad1ab61. I hope it can go on

obtained 99.5% test accuracy on MNIST ad1ab61 Clover Benchmarking:

after epoch 20 132906 ms
 training loss: 752567
 train accuracy: 6110/60000 10.1833%
test accuracy: 9949/10000 99.49%
after tests 2154 ms
record epoch=20
wrote weights to file, filesize 173KB

https://github.com/hughperkins/DeepCL/blob/master/doc/Benchmarking.md

AuroraRAS commented 7 years ago

int idx is changed and no any change when call instanceSpecific. I think this is unusual, but I can't make sure. by git diff ad1ab61 cbbf51f command i found out this:

diff --git a/src/conv/BackpropWeights.cpp b/src/conv/BackpropWeights.cpp
index 691a133..4e0cc7e 100644
--- a/src/conv/BackpropWeights.cpp
+++ b/src/conv/BackpropWeights.cpp
@@ -65,18 +65,15 @@ STATIC BackpropWeights *BackpropWeights::instanceSpecific(int idx, EasyCL *cl, L
         return new BackpropWeightsAuto(cl, layerDimensions);
     }
     if(idx == 0) {
-        return new BackpropWeightsCpu(cl, layerDimensions);
-    }
-    if(idx == 1) {
         return new BackpropWeightsNaive(cl, layerDimensions);
     }
-    if(idx == 2) {
+    if(idx == 1) {
         return new BackpropWeightsScratch(cl, layerDimensions);
     }
-    if(idx == 3) {
+    if(idx == 2) {
         return new BackpropWeightsScratchLarge(cl, layerDimensions);
     }
-    if(idx == 4) {
+    if(idx == 3) {
         return new BackpropWeightsIm2Col(cl, layerDimensions);
     }
     throw std::runtime_error("BackpropWeights::instanceSpecific doesnt handle idx " + toString(idx));
hughperkins commented 7 years ago

Good spot. Fixed in 3edb3c6 . Hows it look now?

AuroraRAS commented 7 years ago
[----------] Global test environment tear-down
[==========] 159 tests from 30 test cases ran. (124509 ms total)
[  PASSED  ] 159 tests.
hughperkins commented 7 years ago

Great! :)

merceyz commented 7 years ago

obtained 99.5% test accuracy on MNIST

Your train accuracy is 10% so something isn't right

hughperkins commented 7 years ago

@merceyz Do you mean, on commit 3edb3c6 ?

AuroraRAS commented 7 years ago

@merceyz I think it's OK. I found another result, maybe same reason: https://github.com/hughperkins/DeepCL/issues/66#issuecomment-223230826

merceyz commented 7 years ago

@hughperkins From https://github.com/hughperkins/DeepCL/issues/133#issuecomment-326956374 he states

obtained 99.5% test accuracy on MNIST ad1ab61 Clover Benchmarking:

@MiniLight He was implementing a new kernel (the one that started this issue) and there was something wrong with it. After he fixed it the train accuracy went up to 97%

hughperkins commented 7 years ago

@merceyz Where are you seeing the '10%' bit?

hughperkins commented 7 years ago

oh, for ad1ab61 . I see. I dont think I need to consider that right, because that is the pre-revert commit?

Odd that the training accuracy is junk, but the test accuracy is good...

merceyz commented 7 years ago

because that is the pre-revert commit?

It's before the kernel changes were added at all. But it's in Clover so who knows

hughperkins commented 7 years ago

ah, ok.

hughperkins commented 7 years ago

@merceyz Changing subjet a bit, for the tests prior to the revert... I didnt look at the failing results carefully. If it's something like a loss is different by 1e-6 or something like that, it could be sufficient just to tweak the tolerance on the test. Do you have an example of the output for one of the failing tests?

merceyz commented 7 years ago

In https://github.com/hughperkins/DeepCL/issues/133#issuecomment-325557797 you can see the values of one of the tests

hughperkins commented 7 years ago

Oh. 0.49 vs 1e-5 is quite a big difference :(

merceyz commented 7 years ago

Though that was on Clover so might need to test it somewhere else

hughperkins commented 7 years ago

Yes. Clover is out of scope. I mean, it's nice if Clover works, but I'm not officially supporting it. (Last time I looked, it was missing loads of things, like eg tanh, though that might have changed since then)

hughperkins commented 7 years ago

Did the tests all pass for you, Chris?

merceyz commented 7 years ago

3edb3c6857cc54f58de1e6450c7e1e6ceab39892 All passed b2562203cd72112b18e9fa511a4f0d3506375200 3 failed

[----------] Global test environment tear-down
[==========] 158 tests from 29 test cases ran. (108265 ms total)
[  PASSED  ] 155 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
[  FAILED  ] testsinglebatch.imagesize5_filtersize3_batchsize2
[  FAILED  ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters

 3 FAILED TESTS
DeepCL\test\testsinglebatch.cpp(221): error: Value of: allOk
  Actual: false
Expected: true
[  FAILED  ] EXCLUDED_testsinglebatch.imagesize5_filtersize3_batchsize2_10filters (4796 ms)
DeepCL\test\testsinglebatch.cpp(221): error: Value of: allOk
  Actual: false
Expected: true
[  FAILED  ] testsinglebatch.imagesize5_filtersize3_batchsize2 (4472 ms)
DeepCL\test\testsimpleconvolvenet.cpp(494): error: Expected: (0.0001f) >= (loss), actual: 0.0001 vs 0.000213079
[  FAILED  ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased (2599 ms)
hughperkins commented 7 years ago

We can ignore the EXCLUDED_ one. The other one, we can probably just tweak the threshold/tolerance from 0.0001 to 0.0003, in this line https://github.com/hughperkins/DeepCL/blob/master/test/testsimpleconvolvenet.cpp#L494

I'm not sure why we are seeing EXCLUDED_ tests. They should be excluded by virtue of .. oh, hmmm.... well, in theory this line https://github.com/hughperkins/DeepCL/blob/master/test/gtest_main.cpp#L46 , but there's no EXCLUDED there. Perhaps we should either 1. add EXCLUDED to this line, or 2. change the test name from EXCLUDED_something to DISABLED_something, which is apparently the official naming convention https://stackoverflow.com/questions/7208070/googletest-how-to-skip-a-test/7208119#7208119

hughperkins commented 7 years ago

(basically, it's ok-ish to tweak tests so they pass, based on our best judgement. We dont want other people to see failing tests, and have to somehow make a judgement on whether they can ignore them or not)