denkle commented 1 year ago

@mikeheddes this is very far from being a complete realization but it is working. It is more of a demonstration of intention to work with it as well as a call for discussing some of the design choices that would have to be made as FPEs are somewhat different from the encodings we have so far

denkle commented 1 year ago

A simple script that I used to visualise the similarity kernel of the resulting FPEs

import torch, torchhd import torchhd.functional as functional import matplotlib.pyplot as plt

torchhd.fractional_power_encoding(torch.arange(1, 4, 1.), 6, "sinc", 1.0, "FHRR")

dimensions = 2000 values = torch.arange(start=0.1, end=10., step=0.05)

Compute FPEs for different values of bandwidth

fpes, _ = torchhd.fractional_power_encoding(values, dimensions, kernel_shape = "sinc", bandwidth = 1.0, vsa = "FHRR") dp_b_1 = fpes.dot_similarity(fpes)/dimensions

fpes, _ = torchhd.fractional_power_encoding(values, dimensions, kernel_shape = "sinc", bandwidth = 0.5, vsa = "FHRR") dp_b_05 = fpes.dot_similarity(fpes)/dimensions

fpes, _ = torchhd.fractional_power_encoding(values, dimensions, kernel_shape = "sinc", bandwidth = 2.0, vsa = "FHRR") dp_b_2 = fpes.dot_similarity(fpes)/dimensions

fpes, _ = torchhd.fractional_power_encoding(values, dimensions, kernel_shape = "sinc", bandwidth = 4.0, vsa = "FHRR") dp_b_4 = fpes.dot_similarity(fpes)/dimensions

index of the value of interest

ind = ((values == 5.0).nonzero(as_tuple=True)[0])

Visualize the above similarity curves for a value at ind

ax = plt.subplot() ax.plot(values,torch.sinc(values-5), color='k',linewidth=4, linestyle='solid', label='True sinc') ax.plot(values,dp_b_05[:,ind], color='b',linewidth=2, linestyle='dotted', label='Bandwidth=0.5') ax.plot(values,dp_b_1[:,ind], color='r',linewidth=2, linestyle='dashed', label='Bandwidth=1.0') ax.plot(values,dp_b_2[:,ind], color='g',linewidth=2, linestyle='dashdot', label='Bandwidth=2.0') ax.plot(values,dp_b_4[:,ind], color='m',linewidth=2, linestyle='solid', label='Bandwidth=4.0') ax.set_xlabel('Values') ax.set_ylabel('Similarity') ax.axes.set_xbound(values.min(),values.max()) ax.axes.set_ybound(-0.3,1.01) ax.legend() plt.show()

mikeheddes commented 1 year ago

Hi Denis, thanks for opening this PR! Great idea to add this functionality.

Here are some things to think about in the implementation:

How are we going to allow users to provide custom phase distributions? I think it would be great if you could provide your own distribution function so that you can get any custom shaped kernel. Maybe the function call excepts something like torch.distributions that we can sample from.
A note on efficiency, it is generally more performant to first multiply the angle by the input value and then convert to a complex number by taking the sin and cos because the power of a complex number is more expensive than a multiplication of floats.
Can we also support multi-dimensional input coordinates? Let's say I want to encode the point (1, 4) using 10000 dimensional hypervectors.
We should think about separating the generation of the holographic basis (the random vector for each "imaginary" dimension) and the fractional power encoding. That way you could encode multiple times with the same basis.
I am planning on implementing the sparse slot based VSA models that has been discussed a couple times at VSAONLINE and it would be good to think about if these functions could generalize to a discrete version (at least that is what the sparse slot VSA model is according to my current understanding).

Here is a usage example that takes some of my points into account:

uniform_dist = torchhd.get_distribution_by_shape("sinc")
kernel = torchhd.sample_kernel(2, 10000, uniform_dist)
x = kernel.fractional_power([1, 4])

This design is not perfect in any way but its something to think about.

I am a little busy the next three weeks but I will try to find time throughout to give feedback on the design changes.

Great work!

denkle commented 1 year ago

@mikeheddes, this revised code updates some of the above issues.

takes into account the efficiency note
supports multi-dimensional input coordinates
separates the generation of the basis and the fractional power encoding step.

For pre-defined kernels torch.distributions is used but it is not entirely clear at the moment how to use it to define own custom distributions

mikeheddes commented 1 year ago

Great work @denkle, I really like the improved design!

It starts to look like it fits more naturally among the Embeddings modules since it takes in a coordinate (vector) from the input domain and then outputs a hypervector which is what all other Embeddings do. I have something like this in mind:

class FractionalPower(nn.Module):
    def __init__(self, in_features, out_features, distribution, vsa, requires_grad):
        self.distribution = distribution # if the distribution is a string use the presets
        # the weights store the sampled phases
        self.weight = nn.Parameter(torch.empty(out_features, in_features), requires_grad)
        self.reset_parameters()

    def reset_parameters(self):
        # sample the weights using the provided distribution

    def basis(self):
        return torch.complex(self.weight.cos(), self.weight.sin()).as_subclass(FHRRTensor)

    def forward(self, input):
        phase = self.weight @ input
        return torch.complex(phase.cos(), phase.sin()).as_subclass(FHRRTensor)

I don't think adding the bandwidth parameter is needed, I think people can easily change that in their own code. Also allows for having different bandwidth parameters for each input dimension. Something that I think would just add unnecessary complexity in the module, best to keep it simple.

For specifying a custom kernel shape I was thinking about something like this:

d = 10000

class MyDist(torch.distributions.Categorical):
    def sample(self, sample_shape=torch.Size()):
        return super().sample(sample_shape) / self.probs.size(-1) * math.pi * 2 - math.pi

in_features = 8
p = torch.ones(in_features, 12)
dist = MyDist(p)
dist.sample((d,)).shape # (10000, 8) which are the weights

embed = FractionalPower(in_features, d, dist)

I am not sure if this is too much work for the user but it at least allows users to define any kind of kernel. We can provide some more common ones as part of the library like this discrete uniform example. Let me know if you think using torch.distributions makes sense or if it's too much overhead.

Great work once again!

denkle commented 1 year ago

@mikeheddes, thanks for the feedback. These are the great points and I agree with most of them!

denkle commented 1 year ago

I don't think adding the bandwidth parameter is needed, I think people can easily change that in their own code. Also allows for having different bandwidth parameters for each input dimension.

I am ambivalent here. Because, on the one hand, I agree that specifying separate bandwidth values for different dimensions is potentially a useful option (though, I have not seen it being needed yet). On the other hand, I believe that removing bandwidth form parameters might not be the best solution. This is because if anything bandwidth is the most important parameter in FPE (at least way more important than dimensionality of hypervectors) as it controls the similarity-preserving characteristics of the encoding (attached figure shows an example). So it is important to adjust it to the data at hand. But if we swipe it under the carpet, users might think that this parameter is not really useful, which would result in nor adjusting the kernel shape to the particular problem. sinc_bandwidth

denkle commented 1 year ago

For specifying a custom kernel shape I was thinking about something like this:
d = 10000

class MyDist(torch.distributions.Categorical):
    def sample(self, sample_shape=torch.Size()):
        return super().sample(sample_shape) / self.probs.size(-1) * math.pi * 2 - math.pi

in_features = 8
p = torch.ones(in_features, 12)
dist = MyDist(p)
dist.sample((d,)).shape # (10000, 8) which are the weights

embed = FractionalPower(in_features, d, dist)
I am not sure if this is too much work for the user but it at least allows users to define any kind of kernel. We can provide some more common ones as part of the library like this discrete uniform example. Let me know if you think using torch.distributions makes sense or if it's too much overhead.

I like how it is done for the discrete sins kernel in your example and I think we could do it but to make it usable we would have to come with a good tutorial where a number of examples are demonstrated and visualised.

denkle commented 1 year ago

Another thing I keep using in the code is raising an error "Fractioncal Power Encoding for this HD/VSA model is not implemented or defined". I think to make a clear point why it is done this way I would at least need to implement FPE for HRR model. This is something I have on my ToDo but will not be able to prioritize this task in the nearest future.

mikeheddes commented 1 year ago

I am ambivalent here. Because, on the one hand, I agree that specifying separate bandwidth values for different dimensions is potentially a useful option (though, I have not seen it being needed yet). On the other hand, I believe that removing bandwidth form parameters might not be the best solution. This is because if anything bandwidth is the most important parameter in FPE (at least way more important than dimensionality of hypervectors) as it controls the similarity-preserving characteristics of the encoding (attached figure shows an example). So it is important to adjust it to the data at hand. But if we swipe it under the carpet, users might think that this parameter is not really useful, which would result in nor adjusting the kernel shape to the particular problem.

That is a good point, I agree that adding it makes sense. If advanced users need to specify per dimension bandwidths they can still do it and leave the parameter as 1 so its not hurting advanced users in that sense.

I like how it is done for the discrete sins kernel in your example and I think we could do it but to make it usable we would have to come with a good tutorial where a number of examples are demonstrated and visualised.

That would be great indeed! With both the tutorial and providing the most common distributions out-of-the-box the user experience should be pretty good.

Another thing I keep using in the code is raising an error "Fractioncal Power Encoding for this HD/VSA model is not implemented or defined". I think to make a clear point why it is done this way I would at least need to implement FPE for HRR model. This is something I have on my ToDo but will not be able to prioritize this task in the nearest future.

This is fine, we can implement functionality incrementally. If someone really needs it they can open an issue so we know what to prioritize.

denkle commented 1 year ago

Another thing I keep using in the code is raising an error "Fractioncal Power Encoding for this HD/VSA model is not implemented or defined". I think to make a clear point why it is done this way I would at least need to implement FPE for HRR model. This is something I have on my ToDo but will not be able to prioritize this task in the nearest future.

This is fine, we can implement functionality incrementally. If someone really needs it they can open an issue so we know what to prioritize.

I added the support for the HRR model in the new commit

denkle commented 1 year ago

I like how it is done for the discrete sins kernel in your example and I think we could do it but to make it usable we would have to come with a good tutorial where a number of examples are demonstrated and visualised.

That would be great indeed! With both the tutorial and providing the most common distributions out-of-the-box the user experience should be pretty good.

I created a script show-casing both HRR and FHRR models as well as various kernels. We could extend further later with more examples.

mikeheddes commented 1 year ago

Hi @denkle, very good progress again! I am working on refactoring some of the code and I was wondering why does the HRR need fewer angles? You have this line dimensions_real = int((self.out_features - 1) / 2) and then later you "Make the generated angles negatively symmetric so they look as a spectrum" why is that?

Your example script is very helpful. I think it will be even more helpful as a Jupyter Notebook so that people can look at the figures on GitHub. I am working on this now also.

I think that after the refactoring the code is ready to be merged!

mikeheddes commented 1 year ago

@denkle I just pushed my changes, let me know what you think.

denkle commented 1 year ago

@denkle I just pushed my changes, let me know what you think.

@mikeheddes, looks great. I only changed one line in the Jupyter notebook otherwise not all histograms were displayed properly.

One thing that was confusing me for a while was that I would get an error when trying to get FPEs for values = torch.arange(start=0.1, end=10., step=0.05), which has shape [198]. It turned out that the refactored code implicitly expects the shape to be [198, 1], which was causing the issue. If you are not in favour of keeping the code handling this situation, should it be mentioned somewhere explicitly in the description of methods?

mikeheddes commented 1 year ago

I made it explicit that you are passing in n vectors of 1 d each thus requiring a shape of (n, 1). This is equivalent to the behavior of torch.nn.Linear. I updated the documentation to make this clear.

hyperdimensional-computing / torchhd

Fractional power encoding #142

Compute FPEs for different values of bandwidth

index of the value of interest

Visualize the above similarity curves for a value at ind