sbi-dev / sbi

Simulation-based inference toolkit
https://sbi-dev.github.io/sbi/
Apache License 2.0
583 stars 150 forks source link

Preconfigured embedding_nets #279

Closed michaeldeistler closed 8 months ago

michaeldeistler commented 4 years ago

1) We should provide a tutorial on how to use embedding nets. 2) We should have pre-configuered RNNs and CNNs.

See also #162

plcrodrigues commented 4 years ago

Hi, I've been using the late-lfi package for a while now and have been converting my scripts to use sbi in the last few days. One of the main features that I use on my research is this idea of letting a neural net learn a suitable set of summary statistics for the inference procedure. With that said, I would like to suggest a simple example on how to do this on sbi.

My goal with this example was to keep things as simple as possible. This means that the simulator model might be a bit dumb and not super interesting, but I hope it makes clear how to use a summary net.

The model has two parameters: r and theta. It generates 100 two-dimensional points centered around (r.cos(theta), r.sin(theta)) and perturbed by a Gaussian noise with variance 0.01. The left part of the figure below shows an example of these simulated points with r=0.70 and theta=pi/4. Now, instead of simply outputting the (x,y) coordinates of each data point, the model generates a grayscale image of the scattered points with dimensions 32 by 32. This image is further perturbed by an uniform noise with values betweeen 0 and 0.2. The right part of the figure below shows an output of the model when r=0.70 and theta=pi/4.

output_model

With this model, the inference procedure (SNPE, SNLE, SNRE or any other method) has to determine the posterior distribution of r and theta given an observation x which lives in a 1024 dimensional space (32 x 32 = 1024). To avoid working directly on these high-dimensional vectors, we use a convolutional neural network (CNN) that takes the 32x32 images and transforms them into 8-dimensional vectors. This CNN is trained along with the neural density estimator in the inference procedure.

Now, the script:

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn 
import torch.nn.functional as F 
from sbi import utils as sbi_utils
from sbi import inference as sbi_inference
import numpy as np

# define the simulator function
def simulator(parameter, return_points=False):

    r = parameter[0]
    theta = parameter[1]

    sigma_points = 0.10
    npoints = 100
    points = []
    for _ in range(npoints):
        x = r * np.cos(theta) + sigma_points * np.random.randn()
        y = r * np.sin(theta) + sigma_points * np.random.randn()
        points.append([x, y])
    points = np.array(points)

    nx = 32
    ny = 32
    sigma_image = 0.20
    I = np.zeros((nx, ny))
    for point in points:
        pi = int((point[0] - (-1)) / ((+1) - (-1)) * nx)
        pj = int((point[1] - (-1)) / ((+1) - (-1)) * ny) 
        if (pi < nx) and (pj < ny):   
            I[pi, pj] = 1
    I = I + sigma_image * np.random.rand(nx, ny)    
    I = I.T
    I = I.reshape(1,-1)
    I = torch.tensor(I)

    if return_points:
        return I, points
    else:
        return I

# set seed for numpy and torch
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)

# set prior distribution for the parameters 
parameter_dim = 2
prior = sbi_utils.BoxUniform(low=torch.tensor([0.0, 0.0]), 
                             high=torch.tensor([1.0, 2*np.pi]))                           

# make a SBI-wrapper on the simulator object for compatibility
simulator_wrapper, prior = sbi_inference.prepare_for_sbi(simulator, prior)

# define the summary net
class Net(nn.Module): 

    def __init__(self): 
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2)
        # we use the maxpool multiple times, but define it once
        self.pool = nn.MaxPool2d(kernel_size=8, stride=8)
        # 8*8 comes from the dimension of the last convnet layer
        self.fc = nn.Linear(in_features=6*4*4, out_features=8) 

    def forward(self, x):
        x = x.view(-1, 1, 32, 32)
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(-1, 6*4*4)
        x = F.relu(self.fc(x))
        return x

# instantiate the summary net
summary_net = Net()

# instantiate a neural network density estimator for the posterior distribution
neural_posterior = sbi_utils.posterior_nn(model='maf', 
                                          embedding_net=summary_net,
                                          hidden_features=10,
                                          num_transforms=2)   

# setup the inference procedure
inference = sbi_inference.SNPE(simulator_wrapper, prior, 
                               density_estimator=neural_posterior, 
                               use_combined_loss=True,
                               show_progress_bars=True)

# run the inference procedure
posterior = inference(num_simulations=10000)

# simulate samples
true_parameter = torch.tensor([0.70, np.pi/4])
x_observed, x_points = simulator(true_parameter, return_points=True)
samples = posterior.set_default_x(x_observed).sample((50000,))

# plot the observation
fig, ax = plt.subplots(facecolor='white', figsize=(11.15, 5.61), ncols=2, constrained_layout=True)
circle = plt.Circle((0, 0), 1.0, color='k', ls='--', lw=0.8, fill=False)
ax[0].add_artist(circle)
ax[0].scatter(x_points[:,0], x_points[:,1], s=20)
ax[0].set_xlim(-1, +1)
ax[0].set_xticks([-1, 0.0, +1.0])
ax[0].set_ylim(-1, +1)
ax[0].set_yticks([-1, 0.0, +1.0])
ax[0].set_title(r'original simulated points with $r = 0.70$ and $\theta = \pi/4$')
ax[1].imshow(x_observed.view(32, 32), origin='lower', cmap='gray')
ax[1].set_xticks([]); ax[1].set_yticks([])
ax[1].set_title('noisy observed data (gray image with 32 x 32 pixels)')
fig.show()

# plot figure with the inference results
fig, ax = sbi_utils.pairplot(samples, 
                             points=true_parameter,
                             labels=['r', 'theta'], 
                             limits=[[0, 1], [0, 2*np.pi]],
                             points_colors='r',
                             points_offdiag={'markersize': 6},
                             fig_size=[7.5, 6.4])
fig.show()

On my 8-core 32 GB laptop, the script runs for approximately 5 minutes.

The results are shown in the figure below.

output_inference

Well, that's it. I agree that the example might be a bit too simple, but at least it does not take too long to run and gives a real taste of how the summary_net feature might be handy :-)

I will be happy to hear your thoughts and see how we can improve this example.

janfb commented 4 years ago

That's great, thanks a lot @plcrodrigues for sharing this example. I think it is ideal to have such a simple and general example!

We have been thinking about adding more examples to the documentation at https://www.mackelab.org/sbi/. So if you are up for it we would very happy about a PR with this example in jupyter notebook form, similar to the tutorial notebooks we have already.

If not, that's totally fine, then we would ask you to allow us to recycle your code?

plcrodrigues commented 4 years ago

Thanks for your feedback @janfb ! Yes, I would very happy to make a PR with the example :) I will try to do it in the next few days.

JuliaLinhart commented 2 years ago

Could you give some more details about what you're having in mind for pre-configuered RNNs and CNNs?

michaeldeistler commented 2 years ago

I'm thinking of sth like this:

from sbi.neural_nets.preconfigured_embeddings import CNNEmbedding
from sbi.utils import posterior_nn

cnn_embedding = CNNEmbedding(num_layers=5, num_channels=[1, 3, 5, 5, 5])
density_estimator = posterior_nn("maf", embedding_net=cnn_embedding)

inference = SNPE(prior=prior, density_estimator=density_estimator)
JuliaLinhart commented 2 years ago

The CNNEmbedding does not include Fully Connected Layers at the end right? successive layers with conv2d - batchnorm - relu - maxpool would be sufficient?

michaeldeistler commented 2 years ago

Maybe add one fully connected layer.

jnsbck commented 2 years ago

Made a quick 1st draft for an RNN embedding net.

class DefaultEmbeddingRNN(nn.Module):
    """Preconfigured RNN for embedding of sequence data.

    Attributes:
        n_layers: Number of rnn layers.
        rnn: RNN layers.
        fc: Fully connected layer
        relu: ReLU Activation
    """
    def __init__(self, n_layers=1):
        """
        Args:
            n_layers: Number of rnn layers.
        """
        super(DefaultEmbeddingRNN, self).__init__()

        self.n_layers = n_layers

        self.rnn = None
        self.fc = None
        self.relu = nn.ReLU()

    def input_reshape(self, x: Tensor) -> Tensor:
        """Ensures input tensor is 3D.

        Reshape so (batch_size, seq_len, input_size) assuming 1D sequence. If 2D input
        is provided.

        Args:
            x: Input tensor.

        Returns:
            Reshaped input tensor if not 3D already.
        """
        if x.dim() == 3:
            return x
        elif x.dim() < 3:
            return x.unsqueeze(2)

    def forward(self, x: Tensor) -> Tensor:
        """RNN forward pass.

        Args:
            x: Input tensor (batch_size, seq_len, input_size) or (batch_size, seq_len)

        Returns:
            RNN output.
        """
        if self.rnn is None:
            x = self.input_reshape(x)
            num_dims = x.shape[-1]
            # TODO: hidden dims == in dims?
            self.rnn = nn.RNN(num_dims, num_dims, self.n_layers, batch_first=True) 
            self.fc = nn.Linear(num_dims, num_dims)

        # reshape so (batch_size, seq_len, input_size) assuming 1D sequence
        x = self.input_reshape(x)

        x, hs = self.rnn(x)
        x = self.fc(x) # TODO: Fully Connected or Linear ?
        x = self.relu(x)

        return x
JuliaLinhart commented 2 years ago

Cool! I have one for a CNN Embedding:

def build_2dcnn_layer(
    in_channels: int,
    out_channels: int,
    kernel_size: Tuple[int, ...],
    padding: Union[str, Tuple[int, ...]],
    ) -> nn.Sequential:
    """ Returns 2D convolutional layer with Batch Normalization and a Maxpool layer
    that reduces the (n x n) input to (n/2 x n/2).

    Args:
        in_channels: Number of channels in the input image
        out_channels: Number of channels produced by the convolution
        kernel_size: Size of the convolving kernel
        padding: Padding added to all four sides of the input.
    """
    cnn_layer = nn.Sequential(
        nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, padding=padding),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(),
        nn.MaxPool2d(2)
        )
    return cnn_layer

class CNNEmbedding(nn.Module):
    def __init__(
        self,
        input_shape: Tuple[int],
        num_layers: int = 3,
        hidden_channels: List[int] = [3, 5, 5],
        out_features: int = 10,
        dropout: float = 0.,
        kernel_size: Tuple[int, ...] = 3,
        ):
        """Class for preconfigured 2D CNN Embedding Network.

        Args:
            input_shape: shape of the input data we we wish to embedd (in_channels, n, n)
            num_layers: Number of CNN layers
            hidden_channels: number of output channels for the hidden CNN layers
            out_features: Number of output features after the FC-layer
            dropout: Dropout probability
            kernel_size: Size of the convolving kernel
        """
        super(CNNEmbedding, self).__init__()

        # We need enough hidden channel values to define the specified number of layers
        assert num_layers == len(hidden_channels)

        self.num_layers = num_layers
        self.hidden_channels = hidden_channels

        # Define list of cnn layers.
        padding = int((kernel_size-1)/2)  # ensures W_in == W_out for each convolution
        cnn_layers = [build_2dcnn_layer(input_shape[0], hidden_channels[0], kernel_size, padding)]
        for i in range(num_layers-1):
            cnn_layers.append(build_2dcnn_layer(hidden_channels[i], hidden_channels[i+1], kernel_size, padding))
        self.cnn_layers = nn.Sequential(*cnn_layers)

        # Dropout layer with dropout-probability = 'dropout'
        self.dropout = nn.Dropout(dropout)

        # Fully connected layer taking as input the flattened output arrays from the
        # last cnn layer
        out_width = int(input_shape[1] / 2**num_layers)  # Width of the output
        self.fc = nn.Linear(
            in_features=hidden_channels[-1] * out_width * out_width, out_features=out_features
            )

    def forward(self, x: Tensor) -> Tensor:
        x = self.cnn_layers(x)
        x = x.view(x.size(0), -1)
        x = self.dropout(x)
        out = self.fc(x)
        return out
JuliaLinhart commented 2 years ago

For the RNN you can maybe put the input_size as an argument, as the Embedding net will be defined before the inference object:

rnn_embedding = RNNEmbedding(input_size=dict_size, num_layers=1, hidden_dims=16, out_features=disct_size)
density_estimator = posterior_nn("maf", embedding_net=rnn_embedding)

inference = SNPE(prior=prior, density_estimator=density_estimator)

So I started with this draft:

class RNNEmbedding(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int = 16,
        num_layers: int = 3,
        out_features: int = 10,
        ):
        """Class for preconfigured RNN Embedding Network.

        Args:
            input_size: The number of expected features in the input x (e.g. dictionary size)
            hidden_size: The number of features in the hidden state h
            num_layers: Number of RNN layers
            out_features: Number of output features after the FC-layer
        """
        super(RNNEmbedding, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # RNN Layer
        self.rnn = nn.RNN(input_size, hidden_size, self.num_layers, batch_first=True)

        # Fully connected layer taking as input the flattened output from the rnn layers
        self.fc = nn.Linear(hidden_size, out_features)

    def forward(self, x: Tensor) -> Tensor:
        hidden = self.init_hidden(x.size(0))  # Initializing hidden state for first input
        out, _ = self.rnn(x, hidden)
        out = self.fc(out)
        # TODO: Flatten before or after fc? --> out.view(x.size(0),-1) 
        return out

    def init_hidden(self, batch_size: Size) -> Tensor:
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        hidden = torch.zeros(self.num_layers, batch_size, self.hidden_size)
        return hidden
michaeldeistler commented 8 months ago

We have CNNs, MLPs, and IID-nets, so I am closing this for now.