klokare commented 2 years ago

Is there documentation or a paper available that explains the parameters involved in creating a HyperGridTransform object?

class HyperGridTransform:

    def __init__(self, num_bins=4, num_acts=1, num_grids=100, num_subspace_dims=1, origin=None,
                 max_period=2.0, min_period=0.05, num_input_dims=None,
                 use_orthogonal_bases=False, use_normal_dist_bases=False, use_standard_bases=False,
                 set_bases=None, set_periods=None, use_random_uniform_periods=False,
                 use_evenly_spaced_periods=False,
                 flatten_output=False, random_state=None):

Some make sense from the name or from the code but other are not immediately clear to me. I do not want to make the wrong assumptions. The examples that use the HGT seem to take the defaults for all but the num_bins and num_grids.

In addition to definitions for the parameters I am curious about best practices or usage scenarios (e.g., "why and when would I want num_subspace_dims to be greater than 1?). Is there a rubric for setting the number of bins and/or grids based on the dimensionality of the data?

Thanks!

jacobeverist commented 2 years ago

You are right that we need more documentation and explanation. I've been working on a big doc on general encoding strategies of which the HGT is the final chapter.

Have you looked at this example? It includes more variation and also visualization.

klokare commented 2 years ago

I missed that one. I figured I was not ready to tackle the visualisation part yet. Maybe it's where I should have started. I'll take a look. Thanks.

klokare commented 1 year ago

I ended up playing with HyperGridTransform in a simple Jupyter notebook using the code from examples/python/transformers/hypergrid_transformer.py. I made up data sets, changed parameters, and saw the effects in the printed grids. In all that experimentation, though, I neglected to collect examples that might be good for documentation. Maybe that is a good thing as it was the iterative nature of the exploring that helps develop the feel for the tool.

With the advent of ChatGPT, I thought I might try having it describe the HyperGridTransform for me and ask it some questions about its use. I am positing the results of that chat here for your feedback in case it got anything wrong, you think it could be elaborated on, etc. or in case it might be helpful for people who are new to the code.

HyperGridTransform

This code defines a class called HyperGridTransform that takes high-dimensional continuous data and transforms it into a binary distributed representation using a method called the hypergrid transform. This can be used for feature hashing or dimensionality reduction. The class has several options for generating the hypergrid transform, such as using orthogonal bases, normal distribution bases, or standard bases.

Initialization Parameters

Here's a description of each parameter in the init procedure for the HyperGridTransform class:

num_input_dims (int): The number of input dimensions for the data. This corresponds to the number of continuous variables in your dataset.
origin (list or array-like): The origin of the hypergrid. This parameter defines the starting point of the grid in the input space. It should be an array or list with the same number of elements as num_input_dims.
num_bins (int): The number of bins in each dimension of the grid. Bins are intervals that discretize the input space, allowing the grid to represent data points in a discrete manner. A higher number of bins leads to a finer discretization and potentially better representation of the data, at the cost of increased computational complexity.
num_acts (int): The number of activation functions used to compute the output of each subspace. These activation functions determine the transformations applied to the input data in each subspace.
num_grids (int): The number of grids (or subspaces) used in the transformation. This parameter controls the complexity of the transformed data. A higher number of grids allows the model to represent more complex relationships between input variables, but may also increase the risk of overfitting and require more computational resources.
num_subspace_dims (int): The number of dimensions in each subspace. This parameter defines the structure of the grid and determines how the input space is divided into subspaces. A higher value may result in more complex data representations but could also increase the risk of overfitting and computational complexity.
min_period (float): The minimum period for the subspace period vector. This parameter influences the size of the bins in each subspace and can affect the smoothness of the transformed data.
max_period (float): The maximum period for the subspace period vector. This parameter also influences the size of the bins in each subspace and can affect the smoothness of the transformed data. The range between min_period and max_period should be chosen to balance smoothness and resolution in the transformed data.

Grids

A "grid" in the context of HyperGridTransform refers to a multi-dimensional lattice that is used to discretize the input space of continuous variables. The grid is created by overlaying a regular pattern of points in the input space, and the points are used to represent the different regions of the space. This discretization helps in handling continuous data more efficiently by converting it into a discrete form that can be more easily managed by machine learning models or optimization algorithms.

num_grids is a parameter in the HyperGridTransform class's init method that determines the number of grids to be used in the transformation process. Each grid is shifted or offset relative to the others, allowing for a more fine-grained representation of the input space. Using multiple grids helps to ensure that the discretized representation captures the structure of the continuous input space more accurately and reduces the risk of missing important features or patterns in the data.

By adjusting the num_grids parameter, you can control the trade-off between the granularity of the discretization and the computational complexity of the transformation process. A larger value for num_grids will result in a more detailed representation of the input space but at the cost of increased computational complexity. In contrast, a smaller value for num_grids will be computationally more efficient but may not capture the input space's structure as accurately. You may need to experiment with different values of num_grids to find the optimal balance between accuracy and efficiency for your specific problem.

Bins

A "bin" refers to a discrete interval or category in which continuous data is divided or sorted. In the context of data transformation or analysis, binning is the process of converting continuous data into discrete categories or bins. Binning is often used for simplifying data, reducing noise, and making it easier to analyze or visualize.

num_bins is a parameter that controls the number of bins or discrete intervals into which the continuous data is divided. It plays an important role in determining the granularity or resolution of the transformed data. When you increase the number of bins, you create finer divisions in the data, which can capture more detail. Conversely, when you decrease the number of bins, you create coarser divisions, which may simplify the data but might also lose some detail.

In the context of the HyperGridTransform, num_bins specifies the number of intervals or categories for each axis of the hypergrid. The continuous input data is then discretized into binary representations by mapping each data point to the corresponding bin in the hypergrid. The choice of num_bins will affect the granularity of the transformed data and the complexity of the resulting binary representation. The optimal value of num_bins depends on the specific problem, the nature of the input data, and the desired balance between capturing detail and computational efficiency.

Subspace dimensions

In the context of this code, a subspace refers to a lower-dimensional space within the original high-dimensional input space. The purpose of the hypergrid transform is to project the input data from the high-dimensional space onto a lower-dimensional subspace while preserving important information in a binary representation.

For example, let's say you have data in a 10-dimensional space, and you want to reduce the dimensionality to 2 dimensions. The 2-dimensional space within the original 10-dimensional space would be considered a subspace. The HyperGridTransform class projects the input data onto this lower-dimensional subspace using a set of basis vectors, and then it discretizes the data into a binary representation using hypergrids. The goal is to capture the essential structure of the data in a lower-dimensional space while making it easier to work with and process.

The num_subspace_dims attribute specifies the number of dimensions in the target subspace, which determines the dimensionality of the lower-dimensional space that the input data will be projected onto.

Min and Max Period

max_period and min_period are parameters that define the range of periods for the subspace basis vectors in the HyperGridTransform class. In this context, the period represents the length of the repeating pattern along each subspace basis vector. These periods are used to create hypergrids in the lower-dimensional subspaces, which in turn are used to discretize the continuous input data into a distributed binary representation.

The periods determine the size of the bins in the hypergrid, which affects the resolution of the discretized representation. The larger the period, the coarser the resolution, and the smaller the period, the finer the resolution.

Here's how max_period and min_period are used in the code:

If set_periods is not provided, the user can choose to generate random uniform periods or evenly spaced periods. If use_random_uniform_periods is set to True, the periods are generated uniformly at random from the range [min_period, max_period) for each subspace basis vector. If neither use_random_uniform_periods nor set_periods is provided, the periods are generated as an evenly spaced sequence of values within the range [min_period, max_period), with the number of periods being num_grids * num_subspace_dims.

The choice of max_period and min_period can affect the performance of the hypergrid transform. They should be chosen based on the specific application and the nature of the input data. A good choice of periods can help preserve the important information in the data while providing an efficient binary representation.

Custom periods

The set_periods parameter is a list of periods corresponding to each basis vector, which will be used to scale the input variables during the transformation. By default, the periods are determined by the max_period and min_period parameters. However, if you want to use custom periods for each basis vector, you can provide them using the set_periods parameter. This can be useful if you have domain-specific knowledge about the optimal periods for your dataset or if you want to experiment with different periods to see how they affect your model's performance.

Choice of basis

These parameters, use_orthogonal_bases, use_normal_dist_bases, and use_standard_bases, define different strategies for generating the basis vectors in each subspace of the HyperGridTransform. The basis vectors determine the orientation and shape of the subspaces, which in turn affect the transformation of the input data. Here's a brief explanation of each strategy:

use_orthogonal_bases: This strategy generates orthogonal basis vectors for each subspace. Orthogonal basis vectors are mutually perpendicular, ensuring that the subspaces have no overlapping dimensions. This can lead to better separation of the input variables and potentially improve the quality of the transformed data. However, generating orthogonal bases can be computationally expensive, especially in high-dimensional spaces.
use_normal_dist_bases: This strategy generates basis vectors by sampling from a normal distribution. This can result in more random and diverse subspaces, which may help capture complex relationships between input variables. However, the subspaces generated with this method may not be orthogonal and could have overlapping dimensions, which could lead to redundancies in the transformed data.
use_standard_bases: This strategy uses standard basis vectors, also known as unit vectors, for each subspace. Standard basis vectors have a single component equal to 1 and all other components equal to 0. This results in axis-aligned subspaces, which can simplify the transformation and make it easier to interpret. However, using standard bases may limit the ability to capture complex relationships between input variables, as the subspaces are restricted to the axes of the input space.

Each of these strategies has trade-offs in terms of computational complexity, representation quality, and interpretability. The choice of strategy would depend on the specific problem, the nature of the input data, and the desired properties of the transformed data. Please note that these parameters are hypothetical and may not be part of an actual implementation.

Using orthogonal basis

A scenario where using orthogonal bases might be preferable to normal distribution or standard bases could be when analyzing high-dimensional data with complex, non-linear relationships between variables, and where the goal is to reduce redundancy and improve interpretability.

For example, suppose you are working on a machine learning problem where you have a high-dimensional dataset with several variables that have intricate relationships, such as gene expression data. In this case, you want to transform the data into a lower-dimensional space to make it more manageable for analysis, visualization, and model training, while still preserving the essential structure and relationships within the data.

Using orthogonal bases in the HyperGridTransform can provide the following benefits in this scenario:

Reduced redundancy: Since orthogonal bases are mutually perpendicular, each subspace captures a unique aspect of the data without overlapping dimensions. This can lead to a more efficient representation, reducing redundancies in the transformed data.
Better separation of variables: Orthogonal bases can help separate variables that are correlated or have complex relationships. This can improve the quality of the transformed data and potentially lead to better performance in downstream tasks, such as classification or regression.
Improved interpretability: The orthogonality property of the basis vectors makes it easier to interpret the transformed data, as each subspace captures a distinct direction in the original data space. This can be particularly useful when visualizing the data or trying to understand the relationships between variables.

While orthogonal bases can provide these benefits, they can also be computationally more expensive to generate, especially in high-dimensional spaces. However, in a scenario like the one described above, where capturing complex relationships and reducing redundancy are critical, the benefits of using orthogonal bases may outweigh the computational cost.

Using normal distribution basis

A scenario where using a normal distribution basis might be preferable to orthogonal or standard bases is when you are working with a dataset where the underlying structure of the data follows a Gaussian distribution, and you want to model the data in a way that leverages this property.

For example, let's say you are working with a dataset of sensor measurements, where each measurement is affected by random noise. In many cases, this noise can be assumed to follow a Gaussian distribution. In this situation, transforming the data using a normal distribution basis can provide a better representation and lead to improved results in downstream tasks.

Using a normal distribution basis in the HyperGridTransform can provide the following benefits in this scenario:

Exploit natural data distribution: If the data follows a Gaussian distribution, using a normal distribution basis aligns with the natural structure of the data. This can result in a more meaningful and compact representation, capturing the essential features of the data.
Robustness to noise: A normal distribution basis can help mitigate the impact of noise in the data. By modeling the data using a Gaussian distribution, it becomes easier to separate the true signal from the noise, leading to more accurate and robust results in downstream tasks such as regression or classification.
Improved generalization: When the data follows a Gaussian distribution, using a normal distribution basis can lead to better generalization in machine learning models. This is because the transformation will capture the essential structure of the data, allowing the model to focus on the most relevant features.

In scenarios like the one described above, where the data follows a Gaussian distribution and robustness to noise is important, using a normal distribution basis can provide benefits that outweigh the potential benefits of using orthogonal or standard bases.

Using standard basis

A scenario where using a standard basis might be preferable to orthogonal or normal distribution bases is when you are working with a dataset that has a linear relationship between the input variables, and the primary goal is to have a simple, easy-to-understand representation of the data.

For example, let's say you are working with a dataset of house prices, where the input variables are square footage and number of bedrooms. In this case, the relationship between the input variables and the house prices might be relatively linear, and a standard basis would be sufficient to capture the relevant features of the data.

Using a standard basis in the HyperGridTransform can provide the following benefits in this scenario:

Simplicity: A standard basis provides a simple, straightforward representation of the data that is easy to understand and interpret. This can be beneficial when the primary goal is to provide an interpretable model or when working with stakeholders who prefer more straightforward methods.
Computational efficiency: Transforming data using a standard basis typically involves fewer computations compared to orthogonal or normal distribution bases. This can lead to faster training times and lower computational requirements, which can be an advantage in situations where computational resources are limited.
Sufficient for linear relationships: When the relationship between the input variables and the target variable is relatively linear, a standard basis can be sufficient to capture the relevant features of the data, and there may be no need to use a more complex transformation like orthogonal or normal distribution bases.

In scenarios like the one described above, where the data has a linear relationship between the input variables and the target variable, and simplicity and computational efficiency are important considerations, using a standard basis can provide benefits that outweigh the potential benefits of using orthogonal or normal distribution bases.

Custom basis

The set_bases parameter is a list of basis vectors that will be used to construct the transformation matrix. Each basis vector should have the same dimension as the number of input variables in your dataset. By default, the basis vectors are determined by the use_orthogonal_bases, use_normal_dist_bases, and use_standard_bases parameters. However, if you want to use custom basis vectors, you can provide them using the set_bases parameter. This can be useful if you have domain-specific knowledge about the optimal basis vectors for your dataset or if you want to experiment with different bases to see how they affect your model's performance.

klokare commented 1 year ago

Methods

`fit(self, X)`

The fit method in the HyperGridTransform class is responsible for learning or estimating the appropriate parameters of the transformation based on the provided input data. The goal of the fit method is to find the best representation of the continuous input space using the discretized grids, taking into account the specific characteristics of the data.

Here's a general outline of the fit method:

Receive input data, which is typically a dataset containing continuous variables.
Analyze the data to estimate the necessary parameters for the transformation. This may involve calculating the range, mean, variance, or other statistics of the input data.
Initialize the grid(s) and basis functions based on the estimated parameters and the user-provided settings for the HyperGridTransform class, such as max_period, min_period, num_bins, and others.
If applicable, learn any other relevant parameters or relationships from the data to improve the transformation process.

Once the fit method has been called on a HyperGridTransform instance, the learned parameters and initialized grids are stored within the instance, allowing you to apply the transformation to new data using the transform method.

`transform(self, X)`

The transform method in the HyperGridTransform class is responsible for applying the learned transformation to new input data. After the fit method has been called and the transformation parameters have been estimated, the transform method can be used to convert continuous input data into a discrete representation using the hypergrid structure.

Here's a general outline of the transform method:

Receive input data, which is typically a dataset containing continuous variables.
Based on the stored transformation parameters and grid structures learned during the fit step, apply the transformation to the input data. This usually involves mapping the continuous input values onto the discretized grid using the basis functions.
As a result, the transformed data is produced, which is now represented in a discrete format suitable for further processing, such as input to a machine learning algorithm or other data analysis techniques.

The transform method enables you to process new data consistently with the same transformation that was learned from the initial dataset. This ensures that the discretization and representation of the continuous space are coherent across different samples or datasets, allowing for more reliable comparisons and analysis.

`fit_transform(self, X)`

The fit_transform method in the HyperGridTransform class is a convenient method that combines both the fit and transform operations into a single step. This method is particularly useful when you want to fit the transformation parameters and immediately apply the transformation to the input data.

The fit_transform method streamlines the process of fitting and transforming the data in one step. However, it's important to note that once the transformation parameters have been learned using the fit method, you can use the transform method to apply the same transformation to other datasets consistently without re-fitting the parameters.

jacobeverist commented 1 year ago

@klokare This is very interesting. It gets a lot of the "what" correct, but not the "why". It seems to crib a lot of the why from other ML approaches when talking about complexity and over-fitting, which is not really a concern here.

Can you explain the process for how you generated this?

Here's a visual that shows what all of the parameters map to for a particular configuration of the HGT. I may have packed in a lot of info into a single visual so it may be a bit confusing. Let me know if you can figure it out.

2d_hgt_param_visuals