numenta / htm.java

Hierarchical Temporal Memory implementation in Java - an official Community-Driven Java port of the Numenta Platform for Intelligent Computing (NuPIC).
GNU Affero General Public License v3.0
310 stars 160 forks source link

Apply optimize stream refactoring #539

Open khatchad opened 6 years ago

khatchad commented 6 years ago

We are in the process of evaluating a research project associated with an automated refactoring tool that attempts to optimize Java 8 streams for maximum performance. In the case of your project, it converted several streams to parallel. The tool is very conservative, meaning that the tool goes through an extensive static analysis to ensure that the transformation is safe, i.e., original program semantics have been preserved. However, this also means that the proposed changes can be a proper subset of all streams that can be optimized. We would appreciate any feedback on the change set that you have.

We found a JMH performance test in your project that does a quick test on a small dataset. To assess the increased parallelism, however, we altered this test to have a larger data set. The results show an overall speedup of ~1.55 (see below):

Score Mode/Unit
avgt
Benchmark ms/op
org.numenta.nupic.benchmarks.SpatialPoolerLocalInhibitionBenchmark.measureAvgCompute_7_Times 1.131135828
org.numenta.nupic.benchmarks.TemporalMemoryBenchmark.measureAvgCompute_7_Times 1.973052763
Average 1.552094295

We have also pasted the patch to your JMH test suite that we used to increase the data set. Please let us know if you have any questions and again we appreciate any feedback!

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
index 8a81d171d..4c978d47c 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
@@ -43,11 +43,11 @@ public abstract class AbstractAlgorithmBenchmark {

     @Setup
     public void init() {
-        SDR = new int[2048];
+        SDR = new int[40960];

         //Layer components
         ScalarEncoder.Builder dayBuilder = ScalarEncoder.builder()
-            .n(8)
+            .n(160)
             .w(3)
             .radius(1.0)
             .minVal(1.0)
@@ -76,9 +76,9 @@ public abstract class AbstractAlgorithmBenchmark {
      */
     protected Parameters getParameters() {
         Parameters parameters = Parameters.getAllDefaultParameters();
-        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 8 });
-        parameters.set(KEY.COLUMN_DIMENSIONS, new int[] { 2048 });
-        parameters.set(KEY.CELLS_PER_COLUMN, 32);
+        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 160 });
+        parameters.set(KEY.COLUMN_DIMENSIONS, new int[] { 40960 });
+        parameters.set(KEY.CELLS_PER_COLUMN, 640);

         //SpatialPooler specific
         parameters.set(KEY.POTENTIAL_RADIUS, 12);//3
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
index 08fe1dea5..69e502db3 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
@@ -41,9 +41,9 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     public void init() {
         super.init();

-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }

@@ -55,10 +55,10 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     }

     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
index 8bbe19698..f333aba40 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
@@ -39,17 +39,17 @@ public class SpatialPoolerLocalInhibitionBenchmark extends AbstractAlgorithmBenc
     public void init() {
         super.init();

-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }

     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
index e6d13c4e0..d3440a674 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
@@ -44,31 +44,31 @@ public class TemporalMemoryBenchmark extends AbstractAlgorithmBenchmark {
         super.init();

         //Initialize the encoder's encoded output
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }

-        SDRs = new int[7][];
+        SDRs = new int[140][];

-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
             SDRs[i] = ArrayUtils.where(SDR, ArrayUtils.WHERE_1);
         }

-        for(int j = 0;j < 100;j++) {
-            for(int i = 0;i < 7;i++) {
+        for(int j = 0;j < 2000;j++) {
+            for(int i = 0;i < 140;i++) {
                 temporalMemory.compute(memory, SDRs[i], true);
             }
         }
     }

     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public ComputeCycle measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
         ComputeCycle cc = null;
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             cc = temporalMemory.compute(memory, SDRs[i], true);
         }
cogmission commented 6 years ago

Hi @khatchad,

Thank you very much for your interest and effort!

Please have a look at the issue #538 because I've identified what I believe to be the source of the problems. I think my comments include a more "wholistic" solution as opposed to increasing the test size? (Although that was very very interesting that increasing the test sample size also avoids the issue - kind of scary actually).

On another note, conventionally and historically a column width > 2048 for the SDR size is never really used and I'm not sure how increasing the size to > 40k columns will serve to give us performance feedback for our "normal" use case. My feeling is that unconventional column widths might not be the best application for the benchmarks because it differs from the "usual" use case?

Please let me know your thoughts on the above? Benchmark composition and evaluation isn't something I have a lot of experience with, so I'm interested in opinions from people with expertise in this area? 👍

On again another note, your project of auto-optimizing Java streams to look for opportunities for concurrency is VERY cool! :-)

cogmission commented 6 years ago

@khatchad Also, to contribute to this project you will need to fill out the Contributor's License and @rhyolight will have to do some administrative processing of it after you've completed that, ok?

khatchad commented 6 years ago

@cogmission,

Also, to contribute to this project you will need to fill out the Contributor's License and @rhyolight will have to do some administrative processing of it after you've completed that, ok?

No problem. I have filled out the paper work.

khatchad commented 6 years ago

@cogmission:

Thank you very much for your interest and effort!

Thank you for considering the request!

Please have a look at the issue #538 because I've identified what I believe to be the source of the problems. I think my comments include a more "wholistic" solution as opposed to increasing the test size?

Yes, I've also replied there as well. The test size increase was not meant as a fix of the problem, so fixing the comparator seems reasonable to me (although I don't have much experience with your project).

On another note, conventionally and historically a column width > 2048 for the SDR size is never really used and I'm not sure how increasing the size to > 40k columns will serve to give us performance feedback for our "normal" use case. My feeling is that unconventional column widths might not be the best application for the benchmarks because it differs from the "usual" use case?

Thanks for this feedback. Admittedly, I've blindly increased the test size, perhaps even artificially, due to my unfamiliarity with the project. Normally in JMH test, there is a @Param annotation that can be easily used to increase the test data size, but, since I didn't find one, I tried many different ways to do so. I will repeat the speedup analysis but this time by not increasing the column size. I'll then report the results again.

khatchad commented 6 years ago

I should also note that this test was run on 8 cores.

khatchad commented 6 years ago

OK, I've reverted the changes to the columns, which results in the following patch:

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
index 8a81d171d..74c422dcb 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/AbstractAlgorithmBenchmark.java
@@ -47,7 +47,7 @@ public abstract class AbstractAlgorithmBenchmark {

         //Layer components
         ScalarEncoder.Builder dayBuilder = ScalarEncoder.builder()
-            .n(8)
+            .n(160)
             .w(3)
             .radius(1.0)
             .minVal(1.0)
@@ -76,9 +76,9 @@ public abstract class AbstractAlgorithmBenchmark {
      */
     protected Parameters getParameters() {
         Parameters parameters = Parameters.getAllDefaultParameters();
-        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 8 });
+        parameters.set(KEY.INPUT_DIMENSIONS, new int[] { 160 });
         parameters.set(KEY.COLUMN_DIMENSIONS, new int[] { 2048 });
-        parameters.set(KEY.CELLS_PER_COLUMN, 32);
+        parameters.set(KEY.CELLS_PER_COLUMN, 640);

         //SpatialPooler specific
         parameters.set(KEY.POTENTIAL_RADIUS, 12);//3
diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
index 08fe1dea5..69e502db3 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerGlobalInhibitionBenchmark.java
@@ -41,9 +41,9 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     public void init() {
         super.init();

-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }

@@ -55,10 +55,10 @@ public class SpatialPoolerGlobalInhibitionBenchmark extends AbstractAlgorithmBen
     }

     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
index 8bbe19698..f333aba40 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/SpatialPoolerLocalInhibitionBenchmark.java
@@ -39,17 +39,17 @@ public class SpatialPoolerLocalInhibitionBenchmark extends AbstractAlgorithmBenc
     public void init() {
         super.init();

-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }
     }

     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public int[] measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
         }

diff --git a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
index e6d13c4e0..d3440a674 100644
--- a/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
+++ b/src/jmh/java/org/numenta/nupic/benchmarks/TemporalMemoryBenchmark.java
@@ -44,31 +44,31 @@ public class TemporalMemoryBenchmark extends AbstractAlgorithmBenchmark {
         super.init();

         //Initialize the encoder's encoded output
-        input = new int[7][8];
-        for(int i = 0;i < 7;i++) {
-            input[i] = encoder.encode((double) i + 1);
+        input = new int[140][160];
+        for(int i = 0;i < 140;i++) {
+            input[i] = encoder.encode((double) ((i % 7) + 1));
         }

-        SDRs = new int[7][];
+        SDRs = new int[140][];

-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             pooler.compute(memory, input[i], SDR, true);
             SDRs[i] = ArrayUtils.where(SDR, ArrayUtils.WHERE_1);
         }

-        for(int j = 0;j < 100;j++) {
-            for(int i = 0;i < 7;i++) {
+        for(int j = 0;j < 2000;j++) {
+            for(int i = 0;i < 140;i++) {
                 temporalMemory.compute(memory, SDRs[i], true);
             }
         }
     }

     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public ComputeCycle measureAvgCompute_7_Times(Blackhole bh) throws InterruptedException {
         ComputeCycle cc = null;
-        for(int i = 0;i < 7;i++) {
+        for(int i = 0;i < 140;i++) {
             cc = temporalMemory.compute(memory, SDRs[i], true);
         }

This results in the following speedup analysis:

SUM of Score Mode/Units
avgt
Benchmark ms/op
org.numenta.nupic.benchmarks.SpatialPoolerLocalInhibitionBenchmark.measureAvgCompute_7_Times 1.131135828
org.numenta.nupic.benchmarks.TemporalMemoryBenchmark.measureAvgCompute_7_Times 1.973052763
Avg 1.552094295

As can be seen, the org.numenta.nupic.benchmarks.SpatialPoolerGlobalInhibitionBenchmark.measureAvgCompute_7_Times benchmark was not executed due to #538.

cogmission commented 6 years ago

Hi @khatchad,

The test size increase was not meant as a fix of the problem

Ah I see. Hmmm... Well if that's the case then why do we want to change the parameters of the test at all (other than the changes necessary to fix the exception being thrown)? I'm not sure why we want to increase the test sizes within the project, if you require these settings for your research/experimentation then you can set them locally to your project by changing the required parameters and then making a new build, no? I'm unclear why I'd change the test in the codebase to adhere to your specific needs? :-)

Not that we don't like to keep our users happy!

khatchad commented 6 years ago

Hi @cogmission,

Ah I see. Hmmm... Well if that's the case then why do we want to change the parameters of the test at all (other than the changes necessary to fix the exception being thrown)? I'm not sure why we want to increase the test sizes within the project, if you require these settings for your research/experimentation then you can set them locally to your project by changing the required parameters and then making a new build, no? I'm unclear why I'd change the test in the codebase to adhere to your specific needs? :-)

Yes, agreed. If you notice in the pull request files, there are no changes to test code. The patch I put in the comments is just to show you how we arrived at the performance results. In fact, we did just want you mention above, i.e., the test code changes are just local.

As for your question as to why we changed the test code locally, we wanted the tests to process more data for a better probability of exploiting parallelism. By no means are we suggesting that the test code be changed permanently. As you said, it is specific to our experiment.

Thanks again for your consideration and please let us know if there are any questions!

khatchad commented 6 years ago

Hi all. Does anyone happen to know why GitHub is saying that I haven't signed the contributor license when I have? I believe that this PR should be passing. Thanks!

cogmission commented 6 years ago

@khatchad Hi, It has to be approved by @rhyolight first...

khatchad commented 6 years ago

Thank you @cogmission and @rhyolight! Just to reiterate, the only suggested changes are found in https://github.com/numenta/htm.java/pull/539/files. It is not necessary to change the tests, they were just augmented to exploit more parallelism and to evaluate the approach. Thank you again for your feedback as well as the opportunity to contribute to your project!

cogmission commented 6 years ago

AFK, but will get to it asap

cogmission commented 6 years ago

@khatchad

Hi, I’m on vacation right now. Will be back toward end of August... Please hold tight. Thanks, David

khatchad commented 6 years ago

No problem, @cogmission. Thanks for the consideration and enjoy your vacation!

khatchad commented 6 years ago

FYI, I noticed an error in the "Avg" column for the average speedup in the table listed in https://github.com/numenta/htm.java/pull/539#issuecomment-375390517. It should be 1.552094295. I've updated it accordingly. Sorry about that.