stefanradev93 / BayesFlow

A Python library for amortized Bayesian workflows using generative neural networks.
https://bayesflow.org/
MIT License
297 stars 45 forks source link

ConfigurationError using TimeSeriesTransformer #110

Closed blairshevlin closed 9 months ago

blairshevlin commented 9 months ago

Hi -

I'm looking to use BayesFlow for a model with a nested RL structure. In a previous post (https://github.com/stefanradev93/BayesFlow/issues/70#issuecomment-1525788321), I saw that you recommended using TimeSeriesTransformer instead of InvariantNetwork to ensure the model has memory. I also know that I want to pass along relevant variables related to context and other model calculations. Therefore, each simulated trial outputs not only the predicted choices but also other variables like utilities (see the function FD2_trial). However, I'm having issues structuring my model so that it has the correct dimensions.

When I try to configure the trainer, I get the following error:

Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace: Exception encountered when calling layer 'multi_head_attention_45' (type MultiHeadAttention).

In this tf.Variable creation, the initial value's shape ((3, 4, 32)) is not compatible with the explicitly supplied shape argument ((7, 4, 32)).

I tried to change the input_dim of the TimeSeriesTransformer, but I keep getting the same error. I've also tried various fixes within the configurator without success. Could you help me determine why this error is occurring and how I can fix it?

Thanks, Blair

Here is the full set-up:

def FD2_trial(offer, norm0, alpha0, beta0, epsilon0, delta0, eta = 0.8):

#Minimum offer amount
mn = 1

alpha = alpha0
beta = beta0
epsilon = epsilon0
delta = delta0

@staticmethod
def _FS_sim(offers=None, alpha=None, norm=None):

    norm_violation = norm-offers
    norm_violation = np.max([norm_violation, 0])

    return offers - alpha*norm_violation

# Norm update (RW)
norm = norm0 + epsilon * (offer-norm0)

CV = _FS_sim(offers=offer, alpha=alpha, norm=norm)

ao = np.max([offer-delta, mn])

if _FS_sim(offers=ao, alpha=alpha, norm=norm) > 0:
    aFV = eta * _FS_sim(offers=ao, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ao-delta, mn]), alpha=alpha, norm=norm), 0)
else:
    aFV = eta**2 * np.max(_FS_sim(offers=np.max([ao+delta, mn]), alpha=alpha, norm=norm), 0)

ro = offer+delta

if _FS_sim(offers=ro, alpha=alpha, norm=norm) > 0:
    rFV = eta * _FS_sim(offers=ro, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ro-delta,mn]), alpha=alpha, norm=norm), 0)
else:
    rFV = eta**2 * np.max(_FS_sim(offers=np.max([ro+delta, mn]), alpha=alpha, norm=norm), 0)

V = CV + (aFV - rFV)

# calculate probability of accepting offer:                                                                                             
prob = 1 / ( 1 + np.exp(-beta*V))
prob = 0.0001 + 0.9998 * prob
prob_a = [1-prob, prob]

# Simulate choices
action = np.random.choice([0, 1], p=prob_a)

# Output is action, offer, utilities, internal norms, and probabilities
return action, offer, V, norm, prob

def FD2_prior():

alpha = RNG.beta(2.0, 2.0)
beta = RNG.gamma(7.5,1.0)
epsilon = RNG.beta(2.0, 2.0)
delta = RNG.uniform(-2, 2)

return np.hstack((alpha, beta, epsilon, delta))

PARAM_NAMES = [
r"$alpha$",
r"$beta$",
r"$epsilon$",
r"$delta$",
]

prior = bf.simulation.Prior(prior_fun=FD2_prior, param_names=PARAM_NAMES)

MIN_OBS = 30
MAX_OBS = 30
NUM_CONDITIONS = 1

def random_num_obs(min_obs=MIN_OBS, max_obs=MAX_OBS):

return RNG.integers(low=min_obs, high=max_obs + 1)

 def generate_offer_matrix(num_obs):

offer_mat = np.zeros([num_obs,2])

for n in range(num_obs):
    offer_mat[n,1] = np.random.randint(-2,1)
    offer_mat[n,0] = np.random.randint(0,3)

return offer_mat

context_gen = bf.simulation.ContextGenerator(
non_batchable_context_fun=random_num_obs,
batchable_context_fun=generate_offer_matrix,
use_non_batchable_for_batchable=True,
 )

def UG_experiment(theta, offer_mat, num_obs):

out = np.zeros((num_obs, 5))
offers = np.zeros(num_obs)
norms = np.zeros(num_obs)
offers[0] = 5 # First offer is always 5
norms[0] = 10 # Initializing norms at 10
for n in range(num_obs):  
    out[n, :] = FD2_trial(offers[n], norm0 = norms[n], alpha0 = theta[0], beta0 = theta[1], epsilon0 = theta[2], delta0 = theta[3])

    if n < (num_obs - 1):
        # Only calculate next offer/norm if there are more offers coming
        norms[n+1] = out[n,3]
        if out[n, 0] == 1:
            # If accept, next offer goes down by [0, 1, 2] (to min of 1)
            offers[n+1] = max(offers[n]+offer_mat[n,1],1)
        else:
            # If reject, next offer goes up by [0, 1, 2] (to max of 9)
            offers[n+1] = min(offers[n]+offer_mat[n,0],9)

return out

simulator = bf.simulation.Simulator(simulator_fun=UG_experiment, context_generator=context_gen)

model = bf.simulation.GenerativeModel(prior=prior, simulator=simulator, name="FD2")

summary_net = bf.networks.TimeSeriesTransformer(input_dim=7, summary_dim=32, name="FD2_summary")

inference_net = bf.networks.InvertibleNetwork(
num_params=len(prior.param_names),
coupling_settings={"dense_args": dict(kernel_regularizer=None), "dropout": False},
name="FD2_inference",
)

amortizer = bf.amortizers.AmortizedPosterior(inference_net, summary_net, name="FD2_amortizer")

prior_means, prior_stds = prior.estimate_means_and_stds(n_draws=100000)
prior_means = np.round(prior_means, decimals=1)
prior_stds = np.round(prior_stds, decimals=1)

def configurator(forward_dict):

# Prepare placeholder dict
out_dict = {}

# Extract simulated choices
data = forward_dict["sim_data"].astype(np.float32)

# Context
context = forward_dict['sim_batchable_context']

# Concatenate 
out_dict["summary_conditions"] = np.c_[data, context].astype(np.float32) 

vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)

# Get data generating parameters
params = forward_dict["prior_draws"].astype(np.float32)

# Standardize parameters
out_dict["parameters"] = (params - prior_means) / prior_stds

return out_dict

trainer = bf.trainers.Trainer(
generative_model=model, amortizer=amortizer, configurator=configurator, checkpoint_path="FD2_model"
)
marvinschmitt commented 9 months ago

Hi Blair,

what are the exact shapes of your summary_conditions, direct_conditions, and parameters for a fixed batch size of, say, batch_size=16?

You can find that out by calling configurator(generative_model(16)) and analyzing the output shapes of the returned dict.

Cheers, Marvin

stefanradev93 commented 9 months ago

Hi Blair, the problem lies in the way the data and context are concatenated in the configurator. Here is the correct way to do it:

def configurator(forward_dict):

    # Prepare placeholder dict
    out_dict = {}

    # Extract simulated choices
    data = forward_dict["sim_data"]

    # Context
    context = np.array(forward_dict['sim_batchable_context'])

    # Concatenate 
    out_dict["summary_conditions"] = np.concatenate((data, context), axis=-1).astype(np.float32) 

    vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
    out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)

    # Get data generating parameters
    params = forward_dict["prior_draws"].astype(np.float32)

    # Standardize parameters
    out_dict["parameters"] = (params - prior_means) / prior_stds

    return out_dict

However, note, that the simulator currently returns nan values.

stefanradev93 commented 9 months ago

Note, also, that the concatenated outputs of the configurator have the shape (batch_size, time_steps, 7), so a correct specification of the TimeSeriesTransformer may look like this:

summary_net = bf.networks.TimeSeriesTransformer(input_dim=7, summary_dim=32, name="FD2_summary")

blairshevlin commented 9 months ago

Hi Blair, the problem lies in the way the data and context are concatenated in the configurator. Here is the correct way to do it:

def configurator(forward_dict):

    # Prepare placeholder dict
    out_dict = {}

    # Extract simulated choices
    data = forward_dict["sim_data"]

    # Context
    context = np.array(forward_dict['sim_batchable_context'])

    # Concatenate 
    out_dict["summary_conditions"] = np.concatenate((data, context), axis=-1).astype(np.float32) 

    vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
    out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)

    # Get data generating parameters
    params = forward_dict["prior_draws"].astype(np.float32)

    # Standardize parameters
    out_dict["parameters"] = (params - prior_means) / prior_stds

    return out_dict

However, note, that the simulator currently returns nan values.

Hi Stefan,

Thanks for the quick response, but I'm still getting the same error even with the corrected way of concatenating the data and context.

LuSchumacher commented 9 months ago

Hi Blair,

I tried to run your model. On my machine (using the development branch of BayesFlow) the training runs without any problems with the following code (as far as I remember I did not change anything crucial):

import numpy as np
import bayesflow as bf
RNG = np.random.default_rng()

PARAM_NAMES = [
    r"$alpha$",
    r"$beta$",
    r"$epsilon$",
    r"$delta$",
]
MIN_OBS = 30
MAX_OBS = 30
NUM_CONDITIONS = 1
BATCH_SIZE = 32
NUM_EPOCHS = 50
NUM_ITER_PER_EPOCHS = 1000

def FD2_trial(offer, norm0, alpha0, beta0, epsilon0, delta0, eta = 0.8):
    #Minimum offer amount
    mn = 1

    alpha = alpha0
    beta = beta0
    epsilon = epsilon0
    delta = delta0

    @staticmethod
    def _FS_sim(offers=None, alpha=None, norm=None):
        norm_violation = norm-offers
        norm_violation = np.max([norm_violation, 0])
        return offers - alpha*norm_violation

    # Norm update (RW)
    norm = norm0 + epsilon * (offer-norm0)

    CV = _FS_sim(offers=offer, alpha=alpha, norm=norm)

    ao = np.max([offer-delta, mn])

    if _FS_sim(offers=ao, alpha=alpha, norm=norm) > 0:
        aFV = eta * _FS_sim(offers=ao, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ao-delta, mn]), alpha=alpha, norm=norm), 0)
    else:
        aFV = eta**2 * np.max(_FS_sim(offers=np.max([ao+delta, mn]), alpha=alpha, norm=norm), 0)

    ro = offer+delta

    if _FS_sim(offers=ro, alpha=alpha, norm=norm) > 0:
        rFV = eta * _FS_sim(offers=ro, alpha=alpha, norm=norm) + eta**2 * np.max(_FS_sim(offers=np.max([ro-delta,mn]), alpha=alpha, norm=norm), 0)
    else:
        rFV = eta**2 * np.max(_FS_sim(offers=np.max([ro+delta, mn]), alpha=alpha, norm=norm), 0)

    V = CV + (aFV - rFV)

    # calculate probability of accepting offer:                                                                                             
    prob = 1 / ( 1 + np.exp(-beta*V))
    prob = 0.0001 + 0.9998 * prob
    prob_a = [1-prob, prob]

    # Simulate choices
    action = np.random.choice([0, 1], p=prob_a)

    # Output is action, offer, utilities, internal norms, and probabilities
    return action, offer, V, norm, prob

def FD2_prior():
    alpha = RNG.beta(2.0, 2.0)
    beta = RNG.gamma(7.5,1.0)
    epsilon = RNG.beta(2.0, 2.0)
    delta = RNG.uniform(-2, 2)

    return np.hstack((alpha, beta, epsilon, delta))

prior = bf.simulation.Prior(prior_fun=FD2_prior, param_names=PARAM_NAMES)

def random_num_obs(min_obs=MIN_OBS, max_obs=MAX_OBS):
    return RNG.integers(low=min_obs, high=max_obs + 1)

def generate_offer_matrix(num_obs):
    offer_mat = np.zeros([num_obs,2])
    for n in range(num_obs):
        offer_mat[n,1] = np.random.randint(-2,1)
        offer_mat[n,0] = np.random.randint(0,3)
    return offer_mat

context_gen = bf.simulation.ContextGenerator(
    non_batchable_context_fun=random_num_obs,
    batchable_context_fun=generate_offer_matrix,
    use_non_batchable_for_batchable=True,
)

def UG_experiment(theta, offer_mat, num_obs):
    out = np.zeros((num_obs, 5))
    offers = np.zeros(num_obs)
    norms = np.zeros(num_obs)
    offers[0] = 5 # First offer is always 5
    norms[0] = 10 # Initializing norms at 10
    for n in range(num_obs):  
        out[n, :] = FD2_trial(
            offers[n], norm0 = norms[n], alpha0 = theta[0], beta0 = theta[1], epsilon0 = theta[2], delta0 = theta[3]
        )
        if n < (num_obs - 1):
            # Only calculate next offer/norm if there are more offers coming
            norms[n+1] = out[n,3]
            if out[n, 0] == 1:
                # If accept, next offer goes down by [0, 1, 2] (to min of 1)
                offers[n+1] = max(offers[n]+offer_mat[n,1],1)
            else:
                # If reject, next offer goes up by [0, 1, 2] (to max of 9)
                offers[n+1] = min(offers[n]+offer_mat[n,0],9)
    return out

simulator = bf.simulation.Simulator(simulator_fun=UG_experiment, context_generator=context_gen)
model = bf.simulation.GenerativeModel(prior=prior, simulator=simulator, name="FD2")

prior_means, prior_stds = prior.estimate_means_and_stds(n_draws=100000)
prior_means = np.round(prior_means, decimals=1)
prior_stds = np.round(prior_stds, decimals=1)

def configurator(forward_dict):

    # Prepare placeholder dict
    out_dict = {}

    # Extract simulated choices
    data = forward_dict["sim_data"].astype(np.float32)

    # Context
    context = forward_dict['sim_batchable_context']

    # Concatenate 
    out_dict["summary_conditions"] = np.c_[data, context].astype(np.float32) 

    vec_num_obs = forward_dict["sim_non_batchable_context"] * np.ones((data.shape[0], 1))
    out_dict["direct_conditions"] = np.sqrt(vec_num_obs).astype(np.float32)

    # Get data generating parameters
    params = forward_dict["prior_draws"].astype(np.float32)

    # Standardize parameters
    out_dict["parameters"] = (params - prior_means) / prior_stds

    return out_dict

summary_net = bf.networks.TimeSeriesTransformer(input_dim=7, summary_dim=32, name="FD2_summary")

inference_net = bf.networks.InvertibleNetwork(
    num_params=len(prior.param_names),
    coupling_settings={"dense_args": dict(kernel_regularizer=None), "dropout": False},
    name="FD2_inference",
)
amortizer = bf.amortizers.AmortizedPosterior(inference_net, summary_net, name="FD2_amortizer")

trainer = bf.trainers.Trainer(
    generative_model=model, amortizer=amortizer, configurator=configurator, checkpoint_path="FD2_model"
)

history = trainer.train_online(NUM_EPOCHS, NUM_ITER_PER_EPOCHS, BATCH_SIZE)

Let me know if this works for you. If not, please specify on which function call your error occurs and what it says.

stefanradev93 commented 9 months ago

I was getting nans, because I got the indentation wrong. The code runs fine on my machine too.

blairshevlin commented 9 months ago

Oddly, I'm still getting the error. I restarted my kernel and copied your code. Here's the full traceback:

ValueError Traceback (most recent call last) File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\trainers.py:1314, in Trainer._checkconsistency(self) 1313 logger.info("Performing a consistency check with provided components...") -> 1314 = self.amortizer.compute_loss(self.configurator(self.generative_model(_n_sim))) 1315 logger.info("Done.")

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\amortizers.py:209, in AmortizedPosterior.compute_loss(self, input_dict, kwargs) 208 # Get amortizer outputs --> 209 net_out, sum_out = self(input_dict, return_summary=True, kwargs) 210 z, log_det_J = net_out

File c:\Users\blair.conda\envs\bf\lib\site-packages\keras\src\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally:

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\amortizers.py:174, in AmortizedPosterior.call(self, input_dict, return_summary, kwargs) 173 # Concatenate conditions, if given --> 174 summary_out, full_cond = self._compute_summary_condition( 175 input_dict.get(DEFAULT_KEYS["summary_conditions"]), 176 input_dict.get(DEFAULT_KEYS["direct_conditions"]), 177 kwargs, 178 ) 180 # Compute output of inference net

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\amortizers.py:404, in AmortizedPosterior._compute_summary_condition(self, summary_conditions, direct_conditions, kwargs) 403 if self.summary_net is not None: --> 404 sum_condition = self.summary_net(summary_conditions, kwargs) 405 else:

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\summary_networks.py:167, in TimeSeriesTransformer.call(self, x, kwargs) 154 """Performs the forward pass through the transformer. 155 156 Parameters ref='c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\summary_networks.py:1'>1;32m (...) 164 Output of shape (batch_size, summary_dim) 165 """ --> 167 rep = self.attention_blocks(x, kwargs) 168 template = self.template_net(x, **kwargs)

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\attention.py:136, in SelfAttentionBlock.call(self, x, kwargs) 123 """Performs the forward pass through the self-attention layer. 124 125 Parameters ref='c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\attention.py:1'>1;32m (...) 133 Output of shape (batch_size, set_size, input_dim) 134 """ --> 136 return self.mab(x, x, kwargs)

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\attention.py:80, in MultiHeadAttentionBlock.call(self, x, y, kwargs) 65 """Performs the forward pass through the attention layer. 66 67 Parameters ref='c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\attention.py:1'>1;32m (...) 77 Output of shape (batch_size, set_size_x, input_dim) 78 """ ---> 80 h = x + self.att(x, y, y, kwargs) 81 if self.ln_pre is not None:

ValueError: Exception encountered when calling layer 'multi_head_attention' (type MultiHeadAttention).

In this tf.Variable creation, the initial value's shape ((3, 4, 32)) is not compatible with the explicitly supplied shape argument ((7, 4, 32)).

Call arguments received by layer 'multi_head_attention' (type MultiHeadAttention): • query=tf.Tensor(shape=(2, 30, 7), dtype=float32) • value=tf.Tensor(shape=(2, 30, 7), dtype=float32) • key=tf.Tensor(shape=(2, 30, 7), dtype=float32) • attention_mask=None • return_attention_scores=False • training=None • use_causal_mask=False

During handling of the above exception, another exception occurred:

ConfigurationError Traceback (most recent call last) c:\Users\blair\Documents\Research\Sinai-UG-BayesNN\FD2Step_Test.ipynb Cell 22 line 1 146 inference_net = bf.networks.InvertibleNetwork( 147 num_params=len(prior.param_names), 148 coupling_settings={"dense_args": dict(kernel_regularizer=None), "dropout": False}, 149 name="FD2_inference", 150 ) 151 amortizer = bf.amortizers.AmortizedPosterior(inference_net, summary_net, name="FD2_amortizer") --> 153 trainer = bf.trainers.Trainer( 154 generative_model=model, amortizer=amortizer, configurator=configurator, checkpoint_path="FD2_model" 155 ) 157 history = trainer.train_online(NUM_EPOCHS, NUM_ITER_PER_EPOCHS, BATCH_SIZE)

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\trainers.py:220, in Trainer.init(self, amortizer, generative_model, configurator, checkpoint_path, max_to_keep, default_lr, skip_checks, memory, **kwargs) 218 # Perform a sanity check with provided components 219 if not skip_checks: --> 220 self._check_consistency()

File c:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\trainers.py:1317, in Trainer._check_consistency(self) 1315 logger.info("Done.") 1316 except Exception as err: -> 1317 raise ConfigurationError( 1318 "Could not carry out computations of generative_model ->" 1319 + f"configurator -> amortizer -> loss! Error trace:\n {err}" 1320 )

ConfigurationError: Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace: Exception encountered when calling layer 'multi_head_attention' (type MultiHeadAttention).

In this tf.Variable creation, the initial value's shape ((3, 4, 32)) is not compatible with the explicitly supplied shape argument ((7, 4, 32)).

Call arguments received by layer 'multi_head_attention' (type MultiHeadAttention): • query=tf.Tensor(shape=(2, 30, 7), dtype=float32) • value=tf.Tensor(shape=(2, 30, 7), dtype=float32) • key=tf.Tensor(shape=(2, 30, 7), dtype=float32) • attention_mask=None • return_attention_scores=False • training=None • use_causal_mask=False

LuSchumacher commented 9 months ago

Very weird. I just created a new conda environment and installed BayesFlow (master branch). The code I posted before, still runs flawlessly. Are you using conda for virtual environments? If yes, what is your version of TensorFlow and BayesFlow? If no, please follow these instructions.

blairshevlin commented 9 months ago

Yes, I'm using conda for virtual environments and followed those instructions to install BayesFlow a few days ago. However, I'll try it again and send an update.

blairshevlin commented 9 months ago

I've discovered an interesting twist: the code only has this error when I run it from a Jupyter Notebook. Otherwise, everything works when I run from a python file. Any idea why that might be the case?

stefanradev93 commented 9 months ago

I am at a bit of a loss here, as I cannot reproduce the issue. It is possible that you are using the wrong kernel in the notebook or that something is wrong with the jupyter lab / jupyter notebook / ipython setup. I would try re-installing these libraries and trying again.

Generally, I recommend avoiding notebooks for anything except quick demos. If the problem persists, you can run the training phase from a script and perform inference interactively from a notebook. Keep us updated. :)

blairshevlin commented 9 months ago

I think I've figured out the issue. It had to do with using checkpoints from an earlier iteration of the model with a different input_dim. Once I deleted those old checkpoints, the model ran fine even in the jupyter lab.

However, I'm now encountering this issue when setting bidirectional = True. All other code is the same, but my summary_net is defined as follows:

`summary_net = bf.networks.TimeSeriesTransformer( name="FD2_f0_summary", input_dim=4, summary_dim=32,

New settings (11/18/23)

attention_settings=dict(key_dim = 32, num_heads = 4, dropout = 0.02),
template_type = "gru",
bidirectional = True
)`

Can you explain why I'm getting this error?

Traceback (most recent call last): File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\trainers.py", line 1314, in _checkconsistency = self.amortizer.compute_loss(self.configurator(self.generative_model(_n_sim))) File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\amortizers.py", line 209, in compute_loss net_out, sum_out = self(input_dict, return_summary=True, kwargs) File "C:\Users\blair.conda\envs\bf\lib\site-packages\keras\src\utils\traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\amortizers.py", line 174, in call summary_out, full_cond = self._compute_summary_condition( File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\amortizers.py", line 404, in _compute_summary_condition
sum_condition = self.summary_net(summary_conditions,
kwargs) File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\summary_networks.py", line 169, in call rep = self.output_attention(tf.expand_dims(template, axis=1), rep, kwargs) File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\attention.py", line 83, in call out = h + self.fc(h, kwargs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer 'multi_head_attention_block_2' (type MultiHeadAttentionBlock).

{{function_node wrappedAddV2device/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [2,1,128] vs. [2,1,64] [Op:AddV2] name:

Call arguments received by layer 'multi_head_attention_block_2' (type MultiHeadAttentionBlock): • x=tf.Tensor(shape=(2, 1, 128), dtype=float32) • y=tf.Tensor(shape=(2, 30, 4), dtype=float32) • kwargs={'training': 'None'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\blair\Documents\Research\Sinai-UG-BayesNN\FD2_model\FD2step_f0_online.py", line 206, in trainer = bf.trainers.Trainer( File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\trainers.py", line 220, in init self._check_consistency() File "C:\Users\blair.conda\envs\bf\lib\site-packages\bayesflow\trainers.py", line 1317, in _check_consistency raise ConfigurationError( bayesflow.exceptions.ConfigurationError: Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace: Exception encountered when calling layer 'multi_head_attention_block_2' (type MultiHeadAttentionBlock).

{{function_node wrappedAddV2device/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [2,1,128] vs. [2,1,64] [Op:AddV2] name:

Call arguments received by layer 'multi_head_attention_block_2' (type MultiHeadAttentionBlock): • x=tf.Tensor(shape=(2, 1, 128), dtype=float32) • y=tf.Tensor(shape=(2, 30, 4), dtype=float32) • kwargs={'training': 'None'}

stefanradev93 commented 9 months ago

Glad you solved the previous issue! The problem with the bidirectional option is a genuine bug, which I will try to resolve tomorrow. I advise using bidirectional=False until then.

stefanradev93 commented 9 months ago

I have fixed the bidirectional bug in the dev branch. Reinstalling from dev via:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/stefanradev93/bayesflow@Development

Should do the job.

blairshevlin commented 9 months ago

Can confirm you've fixed the bidirectional bug. Thanks for your help!