msbrogli / sdm-framework

A Sparse Distributed Memory Framework.
http://sdm-framework.readthedocs.io/
GNU General Public License v2.0
43 stars 7 forks source link

Prove that all hard locations need to be scanned #4

Closed Alex-Linhares closed 5 years ago

Alex-Linhares commented 6 years ago

It would be interesting to prove that it is necessary to scan all hard locations, which is what justifies the code.

If x is a bitstring and HL is the set of hard locations, there can be no algorithm A that scans less than HL hard locations to find all that are active.

Maybe a proof by absurdity?

Here is a weak form of the theorem:

Suppose a set HL U {x} of |HL|+1 hard locations. That is, x actually belongs to this extended set and is itself a Hard Location. Because a scanning of HL alone would not find x, it is necessary to scan |HL| + 1 locations.

Now, a strong form of the theorem would show that any structure or algorithm is impossible.

[No speedup theorem]. There can be no algorithm, no matter which structure it uses, that avoids scanning a hard location p without breaking the assumption of randomness in the data.

msbrogli commented 6 years ago

Let R(x) = { a in HL such as dist(a, x) < r } be the set of all activated hard-locations by bitstring x.

As the bits are randomly chosen with p=0.5, the probability of dist(a, x) =< r equals to [ \sum_{i<=r} C(d, i) ] * 0.5^d, where d is the number of dimensions of the SDM.

On average there are |HL|/2 elements with n-th bit equals 0, and |HL|/2 elements with n-th bit equals 1. When one bit is compared, it may be only equal or different, which would split the HLs into two subset, the one in which the bit is equal (A) and another in which the bit is different (B). Then, the search must go on in both subsets. In subset A, there will be activated the ones with distance r. In subset B, there will be activated the ones with distance r-1.

search(d, n, r) = search(d-1, n/2, r) + search(d-1, n/2, r-1) + O(1) (assuming the subsets are indexed)

In this recursion, the search would only stop when r<0 or n=0 or d<0. I would like to prove that search(d, n, r) = O(n).

I've checked this using the following python script:

In [1]: def search(d, n, r):
   ...:     if r<0 or n==0 or d==0:
   ...:         return 0
   ...:     return 1 + search(d-1, n/2, r-1) + search(d-1, n/2, r)
   ...: 

In [2]: search(1000, 1000000, 451)
Out[2]: 1048575

In [3]: search(1000, 10000000, 451)
Out[3]: 1677721
Alex-Linhares commented 6 years ago

Looks good; do you think we can prove this?