Closed mikeheddes closed 1 year ago
If you only pass it one hypervector there are no bigrams. If you had a tensor T of 3 by d with 3 hypervectors of d dimensions then it would be T[0]P(T[1]) +T[1]P(T[2]) where P is permutation. Hope this helps
On May 13, 2022, at 4:26 PM, linxinyuan @.***> wrote:
What's the bigram for torch.tensor([[1,2,3,4,5,6]])?
— Reply to this email directly, view it on GitHub https://github.com/hyperdimensional-computing/torchhd/issues/26#issuecomment-1126576091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGH6NVIWYI2EAITTO5RVY3LVJ3QJJANCNFSM5VV2SMJQ. You are receiving this because you authored the thread.
Are the distinct_sequence()
and the ngrams()
doing the same thing?
For the tensor torch.randint(-1, 2, (100,3,10000))
, unbind approach is slower than the for loop approach on both CPU and GPU.
CPU:
ngrams: 8.08 ms ± 62.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
distinct_sequence: 9.65 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
GPU:
ngrams: 324 µs ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
distinct_sequence: 357 µs ± 6.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
No a distinct sequence implements a binding-based sequence where for the same 3 element tensor T the result should be: P(P(T[0])) P(T[1]) T[2]. Here all the elements are permuted by a decreasing amount such that the last element is not permuted and all the permuted hypervectors are bound together instead of bundled.
You can make a version of distinct_sequence that uses a for loop instead of unbind and compare their speeds. You can then use that same implementation to see if you can speedup sequence.
Can you give me an example that distinct_sequence()
and ngrams()
return different results? I understand that the formulas of those two functions are different, but they always return the same results when I test them with different shapes.
Here's an example:
>>> from torchhd import functional
>>> x = functional.random_hv(10, 1000)
>>> ngrams = functional.ngrams(x)
>>> distinct_sequence = functional.distinct_sequence(x)
>>> ngrams == distinct_sequence
tensor([False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
They will still return the same result if I changed n to 10: ngrams = functional.ngrams(x,n=10)
.
Are they supposed to return the same result on the same tensor when they have the same n?
Yes, that is correct. The ngrams are like an interpolation between binding-based sequences (distinct sequence) and a multiset. If you set n to the size of the input sequence there is no difference with the distinct sequence because there is only one ngram of that size. And when you set n to 1 it is equivalent to a multiset
I tried more than ten approaches: unbind, for-loop, select, index_select, strided, in-place, out-of-place, clone, empty, and gather.
The combination of In-place mul and for-loop is significantly faster than the others.
The same approach #33 can be applied to distinct_sequence()
and sequence()
after verification.
Update on PyTorch C++ extension: it requires the user to install a C++ compiler which is easy for Linux users (just one command) but hard for windows users(need to download a large installer package manually and config it manually). So we stay away from C++ to make it easy to use for Windows users.
Thank you for the update. I agree, I don't think we have the expertise yet to pull off such a project.
I experimented with different ways of generating random hypervectors. Here's what I found:
def random1(n, d):
selection = torch.randint(0, 2, size=(n * d,))
options = torch.tensor([1, -1], dtype=dtype, device=device)
hv = torch.index_select(options, 0, selection)
return hv.view(n, d)
def random2(n, d):
return torch.randint(0, 2, size=(n, d,)) * 2 - 1
def random3(n, d):
p = torch.full((n, d,), 0.5,)
x = torch.bernoulli(p)
return x.mul(2.0).sub(1.0)
def random4(n, d):
return torch.empty((num_embeddings, embedding_dim,)).uniform_(-1, 1).sign_()
def random5(n, d):
return torch.randn(num_embeddings, embedding_dim).sign_()
def random6(n, d):
return torch.rand((n, d,)).sub_(0.5).sign_()
def random7(n, d):
res = torch.empty((n, d,), dtype=torch.bool).bernoulli_(0.5)
return torch.where(res, +1, -1)
And got the following results on my 2017 MacBook Pro with 2.5 GHz Dual-Core Intel Core i7:
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random1(10, 10000)"
500 loops, best of 5: 550 usec per loop
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random2(10, 10000)"
500 loops, best of 5: 737 usec per loop
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random3(10, 10000)"
500 loops, best of 5: 725 usec per loop
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random4(10, 10000)"
500 loops, best of 5: 454 usec per loop
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random5(10, 10000)"
500 loops, best of 5: 560 usec per loop
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random6(10, 10000)"
500 loops, best of 5: 501 usec per loop
$ python -m timeit -s "import torch, torchhd" "torchhd.functional.random7(10, 10000)"
2000 loops, best of 5: 115 usec per loop
I will make a pull request for the 7th version because it allows for passing in a generator which might be helpful to people. We can easily add a sparsity parameter by altering the 0.5
value.
Created pull request in #47
For example the
functional.ngrams
function can be implemented in a number of ways, we should look into which way is the fasted both on CPU and GPU. The same goes for other functions, classes and loading of datasets.