tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.16k stars 1.08k forks source link

Construct Multinomial conditional distributions, where total_count is a random variable #1624

Open MariaLavrovskaya opened 1 year ago

MariaLavrovskaya commented 1 year ago

I am trying to build a graphical model in Tensorflow Probability, where we first sample a number of positive (1) and negative (0) examples (count_i) from Categorical distribution and then construct Multinomial distribution (Y_i) depending on the value of (count_i). These events (Y_i) are mutually exclusive :

Y_1 ~ Multinomial([.9, 0.1, 0.05, 0.05, 0.1], total_count = [tf.reduce_sum(tf.cast(count==1, tf.float32))

Y_2 ~ Multinomial([0.99, 0.01, 0., 0., 0.], total_count = [tf.reduce_sum(tf.cast(count==0, tf.float32))

I have read these tutorials, however I am stuck with two issues:

  1. This code generates two arrays of length 500, whereas I only need 1 array of 500. What should I change so we only get 1 sample from Categorical distribution and then depending on the overall count of the value we are conditioning on, Multinomial is constructed ?
  2. The sample from Categorical distribution gives only values of 0, whereas it should be a blend between 0 and 1. What am I doing wrong here?

My code is as follows. You can run these to replicate the behaviour:

    def simplied_model():
      return tfd.JointDistributionSequential([
          tfd.Uniform(low=0., high = 1., name = 'e'), #e

          lambda e: tfd.Sample(tfd.Categorical(probs = tf.stack([e, 1.-e], 0)), sample_shape =500), #count #should it be independent?
          lambda count: tfd.Multinomial(probs = tf.constant([[.9, 0.1, 0.05, 0.05, 0.1], [0.99, 0.01, 0., 0., 0.]]), total_count  = tf.cast(tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32)),tf.reduce_sum(tf.cast(count==0, tf.float32))], 0), dtype= tf.float32))
      ])

    tt = simplied_model()
    tt.resolve_graph()
    tt.sample(1)
csuter commented 1 year ago

I think you just need to change the axis argument to tf.stack from 0 (left most) to -1 (right most).

When you pass a sample_shape to sample, the output of tfd.Uniform(..).sample(...) will take that shape. In this case, with sample_shape = 1, the uniform has shape [1]. Then when you tf.stack([e, 1-e], 0), you get shape [2, 1]. But what you want here is shape [1, 2]. Passing a shape [2, 1] probs array to Categorical will actually result in a batch of 2 independent, 1-category Categoricals -- we'll get 2 arrays out and all the samples will be 0! If we instead stack on axis -1, we'll get probs of shape [1, 2], which has the right number of categories (innermost shape), and will yield a shape [500, 1] output, which we can then sum over axis 0 in the last distribution to get the right output.

Hope this makes some sense!

MariaLavrovskaya commented 1 year ago

Thank you so much. Totally makes sense! Should have spotted it myself

MariaLavrovskaya commented 1 year ago

Apologies for bothering again, but I am experiencing issues getting the dims right for the Dirichlet Multinomial model and to generalise it so I can run MCMC on it. I've managed to build the function for one sample, but I cannot seem to figure out how to generalise it to the batched version. The idea is that every time we generate a sample, we want to build two Multinomial distributions. I cannot seem to figure out what the dimensions supposed to look like for posterior. I have tried adding an axis but it does not seem to help.

I have also tried printing out everything but then it made me even more confused as every time the helper functions are run, it seems to be running on one sample at a time. When we run sample, it is vectorised right? So both of distributions for each of the samples should be built at the same time?

My understanding is that for the sample of 2, we need the priors to be of the shape (2,2,5) and counts of the shape (2,2,), however I think I am missing something because it does not work. Any hints please?

#means = [0.0623815 , 0.4076164 , 0.49531835, 0.01912234, 0.0155614 ],
#bac = [0.9 , 0.1 , 0.05, 0.05, 0.1 ]
def _get_probs(priors):
  answer = tf.stack([tf.reshape(priors, [5]), tf.constant([.9, 0.1, 0.05, 0.05, 0.1], dtype = tf.float32)], 0)
  print(priors.shape, 'pass', answer.shape)
  return answer
def _get_counts(vec):
  answer = tf.cast(tf.stack([tf.reduce_sum(tf.cast(vec==1, tf.float32),0),tf.reduce_sum(tf.cast(vec==0, tf.float32),0)], 0), dtype= tf.float32)
  print('pass', answer.shape)
  return answer
def simplified_model(means, bac):
  return tfd.JointDistributionSequential([

      tfd.Uniform(low=0., high = 1.), #b
      lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[500], name = 'count'), #count
                tfd.Dirichlet(concentration = means, name = 'priors'), #priors
      lambda priors,count:  tfd.Multinomial(probs = _get_probs(priors), total_count  = tf.cast(tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32)),tf.reduce_sum(tf.cast(count==0, tf.float32))], 0), dtype= tf.float32)) #we can replace this with categorical distribution

  ])`
csuter commented 1 year ago

Couple of things going on here:

  1. JointDistributions have to do a bit of introspection the first time they are used, so that they know what shapes and dtypes and such to expect. So the forward sampling process is run once with sample_shape [1] (or maybe scalar, []?) to do this "type inference" and the result stored. This is why you see your helpers called with trivial shape
  2. The reshape, as written won't work on the non-trivial sample_shape forward pass -- your array has shape [2, 5] and can't be reshaped into shape [5]. You have at least 2 options: a. rewrite the helper function logic to account for additional leading batch dimensions b. use JointDistributionSequentialAutoBatched

2b is easiest, but it was pretty slow when I ran it. This is probably due to Multinomial being...pretty terrible! JDSAutoBatched uses tf.vectorized_map to do some auto-batching magic, which involves some compilation/program transformation that may be ill-suited to Multinomial's implementation. That said, it may be hard to do better vectorizing by hand. I'd suggest trying both (feel free to pop back in here w/ questions!), and see which is best for your purposes.

HTH!

MariaLavrovskaya commented 1 year ago

Thank you for being so supportive – it is much appreciated, especially because tensorflow is a bit hard to wrap my mind around it!

I have decided to go with the reshape first, because I thought it would be a good exercise to understand shapes better. As far as I understand for sample_size > 1 (let's take 2), the final shape of the Multinomial is supposed to be (2,2,5) -> 4 distributions in total with two for each sample. Probs should be of size (2,2,5) and total counts should be of size (2,2). It's definitely not the most elegant way to go about it, but it's definitely a step forward.

Will now look at the 2b as suggested and try running MCMC for both.

def _get_probs_v2(non_claim_means, priors):
  return tf.concat([priors, non_claim_means], 1) #changed from 0
def _get_counts(count, priors):
  multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
  count = tf.reshape(count, [multiple, 500]) #500 will be RV in itself 
  return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)

def simplified_model(means_2, means):
  return tfd.JointDistributionSequential([
      tfd.Uniform(low=0., high = 1.), #b
      lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[500], name = 'count'), #count
                tfd.Dirichlet(concentration = means, name = 'priors'), #priors
                tfd.Dirichlet(concentration = means_2, name = 'means_b')
      lambda means_b, priors, count:  tfd.Multinomial(probs = _get_probs_v2(means_b, priors), total_count  =_get_counts(count, priors))
brianwa84 commented 1 year ago

FWIW, if numpy is more familiar, you could use the same TFP API but in jax instead. from tensorflow_probability.substrates import jax as tfp

On Wed, Sep 28, 2022, 11:05 AM Maria @.***> wrote:

Thank you for being so supportive – it is much appreciated, especially because tensorflow is a bit hard to wrap my mind around it!

I have decided to go with the reshape first, because I thought it would be a good exercise to understand shapes better. As far as I understand for sample_size

1 (let's take 2), the final shape of the Multinomial is supposed to be (2,2,5) -> 4 distributions in total with two for each sample. Probs should be of size (2,2,5) and total counts should be of size (2,2). It's definitely not the most elegant way to go about it, but it's definitely a step forward.

Will now look at the 2b as suggested and try running MCMC for both.

def _get_probs_v2(non_claim_means, priors):

return tf.concat([priors, non_claim_means], 1) #changed from 0

def _get_counts(count, priors):

multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)

count = tf.reshape(count, [multiple, 500]) #500 will be RV in itself

return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)

def simplified_model(means_2, means):

return tfd.JointDistributionSequential([

  tfd.Uniform(low=0., high = 1.), #b

  lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[500], name = 'count'), #count

            tfd.Dirichlet(concentration = means, name = 'priors'), #priors

            tfd.Dirichlet(concentration = means_2, name = 'means_b')

  lambda means_b, priors, count:  tfd.Multinomial(probs = _get_probs_v2(means_b, priors), total_count  =_get_counts(count, priors))

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#issuecomment-1261051498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI5CTLFHG2XJEVW4BJTWARNDXANCNFSM6AAAAAAQVZDV6E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MariaLavrovskaya commented 1 year ago

Thank you both for your help!

I am trying to run the MCMC sampler based on the run function shown here. I have adapted my code and found appropriate unconditional bijectors to map initial state.

However, the algorithm fails because we are getting NaN values now and then. I am very confused and this time I have no clue if there is a workaround not to see NaN values. Am I doing something wrong or is there a trick to overcome this?

The logic is as follows. Beta distribution controls for the proportion of 0 values in the generated sample. Dirichlet for 'first seen' controls the count for each of possible values occurring in 'seen' sample and Dirichlet for 'already seen' controls the sample of each of 5 possible values occurring in 'already seen' sample. They both have hyperparams coming from the data.

def _get_probs_v2(already_seen, priors):
  print('prior_shape', priors.shape)
  return tf.concat([priors, already_seen], 1) #changed from 0

def _get_counts(count, priors):
  multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
  count = tf.reshape(count, [multiple, 50000]) 
  return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)

def simplified_model(first_seen, already_seen):
  return tfd.JointDistributionSequential([

      tfd.Beta(concentration1 = 559.49,concentration0= 291.28), #b
      lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[50000], name = 'count'), #count
                tfd.Dirichlet(concentration = first_seen, name = 'priors'), #priors
                tfd.Dirichlet(concentration = already_seen, name = 'already_seen'), 
      lambda already_seen, priors, count:  tfd.Independent(tfd.Multinomial(probs = _get_probs_v2(already_seen, priors), total_count  =_get_counts(count, priors)), reinterpreted_batch_ndims=2)  #working_function
  ])
@tf.function
def simplified_log_prob(init_beta, init_first, init_already):
  return simplified_model(first_seen, already_seen).log_prob([init_beta, init_first, init_already, input_excess])
def sample_simplied(num_results, num_burnin_steps):
  hmc = tfp.mcmc.HamiltonianMonteCarlo(
      target_log_prob_fn=simplified_log_prob,
      num_leapfrog_steps=10,
      step_size=0.005)

  initial_state = [
      tf.ones([1], name='init_beta'),
      tf.constant(first_seen, name = 'init_first_seen'),
      tf.constant(already_seen, name = 'init_already_seen')
      # means* tf.ones([], dtype=tf.float32, name="init_means")
  ]
  unconstraining_bijectors = [
      tfb.Sigmoid(), #to map real values to get something between 0 and 1 
      tfb.SoftmaxCentered(),        # Dirichlet
      tfb.SoftmaxCentered()
  ]
  kernel = tfp.mcmc.TransformedTransitionKernel(
      inner_kernel=hmc, bijector=unconstraining_bijectors)

  samples, kernel_results = tfp.mcmc.sample_chain(
      num_results=num_results,
      num_burnin_steps=num_burnin_steps,
      current_state=initial_state,
      kernel=kernel)

  acceptance_probs = tf.reduce_mean(
      tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)

  return samples, acceptance_probs
first_seen = tf.constant([[0.06225592, 0.40752962, 0.4955014 , 0.01913294, 0.01558013]], dtype = tf.float32)
already_seen = tf.constant([[0.9 , 0.1 , 0.05, 0.05, 0.1 ]], dtype = tf.float32)

Thank you again for being very supportive.

csuter commented 1 year ago

Can you add allow_nan_stats=False to all distribution constructors, and validate_args=True to all distribution and bijector constructors, and run again? You may get more informative errors and/or find source of NaNs sooner (I haven't had a close look at the latest code yet)

csuter commented 1 year ago

(no need to add these to args of tfd.Sample or tfd.Independent -- just the "concrete" distriubtions these are wrapping)

MariaLavrovskaya commented 1 year ago

Based on the first error that I am getting, the issue seems to be in the first helper function. Will try getting to the bottom of it.

MariaLavrovskaya commented 1 year ago

Changed the first helper so it would not raise any errors. Ran sample() a number of times and no errors are raised, however I still seen a lot of inf .

`def _get_probs_v2(already_seen, priors):
  multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
  priors = tf.reshape(priors, [multiple, 1, 5])
  already_seen = tf.reshape(already_seen, [multiple, 1, 5])
  return tf.concat([priors, already_seen], 1) 

def _get_counts(count, priors):

  multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
  count = tf.reshape(count, [multiple, 50000]) 
  return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)

def simplified_model(first_seen, already_seen):
  return tfd.JointDistributionSequential([

      tfd.Beta(concentration1 = 559.49,concentration0= 291.28, allow_nan_stats=False, validate_args=True), #b
      lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1), allow_nan_stats=False, validate_args=True), sample_shape =[50000], name = 'count'), #count
                tfd.Dirichlet(concentration = first_seen, name = 'priors', allow_nan_stats=False, validate_args=True), #priors
                tfd.Dirichlet(concentration = already_seen, name = 'already_seen', allow_nan_stats=False, validate_args=True), 
      lambda already_seen, priors, count:  tfd.Independent(tfd.Multinomial(probs = _get_probs_v2(already_seen, priors), total_count  =_get_counts(count, priors)), reinterpreted_batch_ndims=2)  #working_function
  ])
@tf.function
def simplified_log_prob(init_beta, init_first, init_already):
  return simplified_model(first_seen, already_seen).log_prob([init_beta, init_first, init_already, input_exc])`
first_seen = tf.constant([[0.06225592, 0.40752962, 0.4955014 , 0.01913294, 0.01558013]], dtype = tf.float32)
already_seen  = tf.constant([[0.78, 0.1, 0.5, 0.5, 0.02]], dtype = tf.float32)
csuter commented 1 year ago

Hmm. I think you can rewrite the 2-category Categorical + sum to be a tfd.Binomial(total_count=50000).

I'm confused by the log_prob call in simplified_log_prob because it is only passing 4 values but the JointDistribution has 5 random variables (beta, sample(categorical), dir1, dir2, multinomial).

If you can provide a runnable example in a Colab, I can try and debug more proactively.

MariaLavrovskaya commented 1 year ago

Yes, sure. https://colab.research.google.com/drive/1vSTW49QrkdIC-ciELe8nZ98N8tZefJSQ?usp=sharing

It seems like the errors are coming from Dirichlet distributions if I uselog_prob() without rewriting it.

MariaLavrovskaya commented 1 year ago

I first rewrote thesimplified_log_prob() as mentioned above because I thought for MCMC we need to define a closure over our joint to output prob for our data. However, I took a step back in this example and ran predefined log_prob() to trace the error step by step.

MariaLavrovskaya commented 1 year ago

Update: the inf issue is now sorted. Was incorrectly parametrising Dirichlet distributions with proportions, whereas they take mean counts. Trying to figure out how to rewrite the log_prob manually as the one out of box fails as it is expecting shape (n,n), whereas we need (n,2), n being the number of generated samples. Is there a way to do that by passing the Params into the JointDistributionSequential? Will report my progress along the way

MariaLavrovskaya commented 1 year ago

Update: I have skimmed through the JointDistributionSequential() again and I have realised that in order for me to use log_prob(), I need to refactor dimensions of the Dirichlet distributions. Will try this one.

MariaLavrovskaya commented 1 year ago

Almost giving up :( I slightly simplified the function to use Binomial instead of a Categorical (it's a version 2) and right now log_prob() seems to be working for the sample_size up to 2, but it starts failing for samples larger than 2. I am not sure if I can change the params somewhere so I would not need to write a log_prob() from scratch as I was trying not to use a handwritten approach from the very start. Would you mind having a look if you have time please? I'd greatly appreciate any suggestions! https://colab.research.google.com/drive/1vSTW49QrkdIC-ciELe8nZ98N8tZefJSQ?usp=sharing

csuter commented 1 year ago

I believe this will do what you want:

def simplified_model_v2(first_seen, already_seen):
  return tfd.JointDistributionSequential([
    tfd.Beta(concentration1 = 559.49,
             concentration0= 291.28), #b
    lambda b: tfd.Binomial(probs=b, total_count=50000, name='count'),
    tfd.Independent(
        tfd.Dirichlet(concentration=tf.stack([first_seen, already_seen], axis=0)),
        reinterpreted_batch_ndims=1),
    lambda t, count: tfd.Independent(
        tfd.Multinomial(
            probs=t,
            total_count=tf.stack([count, 50000 - count], axis=-1)),
        reinterpreted_batch_ndims=1)
  ])

There are a couple things I needed to fix

  1. We can't just sub the Binomial with [p, 1-p] for what you were doing with the Categorical + sum. Since we want the total positive and negative trials to sum to 50000, we need to draw c from a Binomial(p, 50000) and use (50000 - c) for the negative count.
  2. I had to fiddle with shapes of things a lot...there isn't a great general solution to this TBH. The strategies include some combination of (a) "think really hard about shape semantics in the presence of non-trivial sample_shape", (b) "factor out the lambdas and add a bunch of print statements showing the shapes of intermediate values", and (c) "use JDSAutoBatched, which is sometimes painfully slow but was designed to save us from (a) and (b)".

Sorry about the challenges! I hope the next step is easier. I admire your perseverance trying to make it work w/out the AutoBatched magic :) It's also my personal preference, even though it's sometimes painful.

MariaLavrovskaya commented 1 year ago

Morning! Apologies I was taking a break over the weekend. This works smoothly – thank you so much.

I've noticed error in (1) later on Friday and it was stupid of me to assume that it would work any differently.

I cannot agree with you more. Although it's definitely sometimes painful and frustrating, but this is the best way to learn (as long as I can pop in here now and then to ask questions). 😸

MariaLavrovskaya commented 1 year ago

I think I managed to make MCMC work. I took values of simplified_model_v2(first_seen, already_seen).sample() to be initial_state and then imposed bijectors to map to unconstrained space (hopefully I did it correctly). I also wrote a partial application for the target log prob function

@tf.function
def simplified_log_prob(init_beta, init_binom, init_dir):
  return simplified_model_v2(first_seen, already_seen).log_prob([init_beta, init_binom, init_dir, inp_exc])

def sample_simplified(num_results, num_burnin_steps):

  hmc = tfp.mcmc.HamiltonianMonteCarlo(
      target_log_prob_fn=simplified_log_prob,
      num_leapfrog_steps=10,
      step_size=0.005)

  initial_state = [
      tf.constant(init_b, dtype=tf.float32, name='init_beta'),
      tf.constant(init_binom, name = 'init_binomial'),
      tf.constant(init_dirichlets, name = 'init_dir')
      # means* tf.ones([], dtype=tf.float32, name="init_means")
  ]

  unconstraining_bijectors = [
      tfb.Sigmoid(), #bijector for Beta 
      tfb.Identity(), #bijector for Binomial
      tfb.SoftmaxCentered() # Bijector for Dirichlets
  ]
  kernel = tfp.mcmc.TransformedTransitionKernel(
      inner_kernel=hmc, bijector=unconstraining_bijectors)

  samples, kernel_results = tfp.mcmc.sample_chain(
      num_results=num_results,
      num_burnin_steps=num_burnin_steps,
      current_state=initial_state,
      kernel=kernel)
  acceptance_probs = tf.reduce_mean(
      tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)

  return samples, acceptance_probs
MariaLavrovskaya commented 1 year ago

Thank you so much for being so supportive and guiding me!

I now need to introduce more complexity into the model by imposing another distribution for the total_countfor Binomial, but hopefully I will manage it work smoothly myself 😅

csuter commented 1 year ago

Glad to help, and glad you are continuing to make progress. I think your next challenge may be that Binomial is discrete so HMC will not work for this latent variable. You may be able to analytically marginalize out some of the discrete variables, or else do some kind of hybrid metropolis-within-gibbs thing.

MariaLavrovskaya commented 1 year ago

I decided to take a step forward and try to generalise it to fit a number of values at a time. Right now, we are fitting only one value at a time, however what I want is to fit 7 values at a time. So for example, we generate for one Beta and one Binomial and two Dirichlets distributions we pass 7 samples into the Multinomial , assuming that each of them came from this generating process.

So based on the explanations here and your valuable comments, I am trying to think about shapes really hard. My understanding is as follows:

I started off by thinking that there are three possible scenarios: Number one:

  1. Beta ~ two values (2,)
  2. Binomial (2,)
  3. Dirichlets (2,2,5)
  4. Multinomials final shape is (2,7,2,5). For sample of two, it should output two values for log_prob. This looks more intuitive, however I am not sure how I am supposed to transform the probs in the way so that it works as desired. My way of thinking was to add a dimension t[:, tf.newaxis, ...] but this fails for the forward pass [2,5].

Number two, which I now understand is wrong was to do the following:

  1. Beta ~ two values (2,)
  2. Binomial (2,)
  3. Dirichlets (2,2,5)
  4. Multinomials final shape is (7,2,2,5). Again, for sample of two, it should output two values for log_prob, but here I only managed to get one value for log_prob so this is probably not the right way to go forward.

Number three:

  1. Beta ~ two values (2,)
  2. Binomial (2,)
  3. Dirichlets (2,2,5)
  4. Multinomials final shape is (2,2,7,5). This seems like the right way based on what I read in the tutorials. I have implemented the version of this using a lookup table within the probs() function, however I wonder if there is a neater way to do it. I have not tried the MCMC algorithm on this yet but the sample() and log_prob() works for both.

Do you think my understanding is correct and (3) are the right shapes I am supposed to be getting? Can you please suggest if my logic is right and how would you reformulate probs in a 'cleaner' way?

I put everything in the notebook to make it a bit more readable: https://colab.research.google.com/drive/1vSTW49QrkdIC-ciELe8nZ98N8tZefJSQ?usp=sharing

brianwa84 commented 1 year ago

Instead of summing a bunch of two way categoricals, consider using Binomial.

On Mon, Sep 26, 2022, 10:18 AM Maria @.***> wrote:

Closed #1624 https://github.com/tensorflow/probability/issues/1624 as completed.

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#event-7459382656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI2SAFPFE5J7BG3NWJLWAGWE7ANCNFSM6AAAAAAQVZDV6E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

csuter commented 1 year ago

Sorry, I have been working on a deadline and haven't had time to respond...I will hopefully have some more time in a few days.

@brianwa84 I think we arrived at your suggestion a few comments back in the thread -- maybe you missed that?

MariaLavrovskaya commented 1 year ago

No worries. I hope you will meet your deadline! I believe it works as intended 😸 I have another question that is slightly different to the thread's topic. Should I create a separate ticket for that?

I have seen in the tutorials that there is a two Poisson model which two different rate parameters, is there a way to do this with two distributions from exponential family but which are parametrised differently? I am not sure if I am even allowed to do that. I believe I have a model which has LogNormal body and after certain 'switchpoint' it has exponential tail. So far I thought of using 'Mixture', however I am worried that it's not going to give me this 'switchpoint' behaviour. Any suggestions are very much welcomed!

brianwa84 commented 1 year ago

No worries, I sent that response on Sep 26 by email, for some reason it only just showed up.

MariaLavrovskaya commented 1 year ago

Hi. Would you mind advising me on this please? I want to create a mixture model with two normals, where all the parameters have Uniform priors. I have implemented the model using Joint Distribution and log_prob and sample do work, however when I run MCMC sampler, it just stays on the initialised values and does not move. I also used a bijector to map constrained intervals to unconstrained space, however I am not sure what is wrong that is not working. Would you mind giving me a hand please?

def affine(rv):

  return tf.transpose(rv* tf.ones([3])[..., tf.newaxis])

def model_normal_normal():
  return tfd.JointDistributionSequential([
      tfd.Uniform(low = 0.08, high = 0.11), # weights
      tfd.Uniform(low = 4.8, high =  5.1), #loc_body
      tfd.Uniform(low = 6.4, high = 7.3), #loc_tail
      tfd.Uniform(low = 0.08, high = 0.1), #scale_body
      tfd.Uniform(low = 0.1, high = 0.2), #scale_tail
      lambda scale_tail, scale_body, loc_tail, loc_body, weights: tfd.Independent(tfd.Sample(tfd.Mixture(
          cat = tfd.Categorical(
              probs = tf.stack([affine(1-weights), affine(weights)], axis=-1), validate_args=True,allow_nan_stats=False),
              components = [
                  tfd.Normal(loc = affine(loc_body), scale = affine(scale_body), validate_args=True,allow_nan_stats=False), 
                  tfd.Normal(loc = affine(loc_tail), scale = affine(scale_tail), validate_args=True,allow_nan_stats=False)]), sample_shape = 23507),reinterpreted_batch_ndims=2)

  ])

#define partial application (closure) over our data
@tf.function
def log_prob_normal(init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail):
  return model_normal_normal().log_prob([init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail, data])

# bijector to map contrained parameters to real

# Interval transformation
def tfp_interval(a,b):
  return tfb.Inline(
    inverse_fn=(
        lambda x: tf.math.log(x - a) - tf.math.log(b - x)),
    forward_fn=(
        lambda y: (b - a) * tf.sigmoid(y) + a),
    forward_log_det_jacobian_fn=(
        lambda x: tf.math.log(b - a) - 2 * tf.nn.softplus(-x) - x),
    forward_min_event_ndims=0,
    name="interval")

def sample_normal_normal(num_results, num_burnin_steps):
  hmc = tfp.mcmc.HamiltonianMonteCarlo(
      target_log_prob_fn=log_prob_normal,
      num_leapfrog_steps=10,
      step_size=0.005)

  initial_state = [
      tf.constant(init_w, dtype=tf.float32, name='init_weights'),
      tf.constant(init_l_b, dtype=tf.float32, name = 'loc_body'),
      tf.constant(init_l_t, name = 'loc_tail'),
      tf.constant(init_s_b, name = 'scale_body'),
      tf.constant(init_s_t, name = 'scale_tail')
      # means* tf.ones([], dtype=tf.float32, name="init_means")
  ]
#we specify bijectors to map the initial values to unconstrained space
  unconstraining_bijectors = [
      tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.11, tf.float32)), #bijector for all uniforms
      tfp_interval(tf.constant(4.8, tf.float32), tf.constant(5.1, tf.float32)), 
      tfp_interval(tf.constant(6.4, tf.float32), tf.constant(7.3, tf.float32)),
      tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.1, tf.float32)),
      tfp_interval(tf.constant(0.1, tf.float32), tf.constant(0.2, tf.float32)) 
  ]
  kernel = tfp.mcmc.TransformedTransitionKernel(
      inner_kernel=hmc, bijector=unconstraining_bijectors)
  samples, kernel_results = tfp.mcmc.sample_chain(
      num_results=num_results,
      num_burnin_steps=num_burnin_steps,
      current_state=initial_state,
      kernel=kernel)
  acceptance_probs = tf.reduce_mean(
      tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)

  return samples, acceptance_probs
csuter commented 1 year ago

Sigmoid bijector has low and high parameters. Can you try using that instead of the custom inline bijector? Those can be finicky at times (i see nothing wrong with your impl but haven't checked the math -- probably it's right ☺️).

Usually if HMC is rejecting too much the first thing to try is a lower step size. Try going down two orders of magnitude and see if it's accepting some. If you find a range where it's a bit better, wrap your kernel in tfp.mcmc.StepSizeAdaptation kernel initiated from that value and see how it goes.

I haven't thought much at all about the model setup. Could be something about that that is especially tricky to sample. Try the above and check back in?

On Wed, Oct 19, 2022 at 11:39 Maria @.***> wrote:

Hi. Would you mind advising me on this please? I want to create a mixture model with two normals, where all the parameters have Uniform priors. I have implemented the model using Joint Distribution and log_prob and sample do work, however when I run MCMC sampler, it just stays on the initialised values and does not move. I also used a bijector to map constrained intervals to unconstrained space, however I am not sure what is wrong that is not working. Would you mind giving me a hand please?

def affine(rv):

return tf.transpose(rv* tf.ones([3])[..., tf.newaxis])

def model_normal_normal(): return tfd.JointDistributionSequential([ tfd.Uniform(low = 0.08, high = 0.11), # weights tfd.Uniform(low = 4.8, high = 5.1), #loc_body tfd.Uniform(low = 6.4, high = 7.3), #loc_tail tfd.Uniform(low = 0.08, high = 0.1), #scale_body tfd.Uniform(low = 0.1, high = 0.2), #scale_tail lambda scale_tail, scale_body, loc_tail, loc_body, weights: tfd.Independent(tfd.Sample(tfd.Mixture( cat = tfd.Categorical( probs = tf.stack([affine(1-weights), affine(weights)], axis=-1), validate_args=True,allow_nan_stats=False), components = [ tfd.Normal(loc = affine(loc_body), scale = affine(scale_body), validate_args=True,allow_nan_stats=False), tfd.Normal(loc = affine(loc_tail), scale = affine(scale_tail), validate_args=True,allow_nan_stats=False)]), sample_shape = 23507),reinterpreted_batch_ndims=2)

])

define partial application (closure) over our data

@tf.function def log_prob_normal(init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail): return model_normal_normal().log_prob([init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail, data])

bijector to map contrained parameters to real

Interval transformation

def tfp_interval(a,b): return tfb.Inline( inverse_fn=( lambda x: tf.math.log(x - a) - tf.math.log(b - x)), forward_fn=( lambda y: (b - a) tf.sigmoid(y) + a), forward_log_det_jacobian_fn=( lambda x: tf.math.log(b - a) - 2 tf.nn.softplus(-x) - x), forward_min_event_ndims=0, name="interval")

def sample_normal_normal(num_results, num_burnin_steps): hmc = tfp.mcmc.HamiltonianMonteCarlo( target_log_prob_fn=log_prob_normal, num_leapfrog_steps=10, step_size=0.005)

initial_state = [ tf.constant(init_w, dtype=tf.float32, name='init_weights'), tf.constant(init_l_b, dtype=tf.float32, name = 'loc_body'), tf.constant(init_l_t, name = 'loc_tail'), tf.constant(init_s_b, name = 'scale_body'), tf.constant(init_s_t, name = 'scale_tail')

means* tf.ones([], dtype=tf.float32, name="init_means")

]

we specify bijectors to map the initial values to unconstrained space

unconstraining_bijectors = [ tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.11, tf.float32)), #bijector for all uniforms tfp_interval(tf.constant(4.8, tf.float32), tf.constant(5.1, tf.float32)), tfp_interval(tf.constant(6.4, tf.float32), tf.constant(7.3, tf.float32)), tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.1, tf.float32)), tfp_interval(tf.constant(0.1, tf.float32), tf.constant(0.2, tf.float32)) ] kernel = tfp.mcmc.TransformedTransitionKernel( inner_kernel=hmc, bijector=unconstraining_bijectors) samples, kernel_results = tfp.mcmc.sample_chain( num_results=num_results, num_burnin_steps=num_burnin_steps, current_state=initial_state, kernel=kernel) acceptance_probs = tf.reduce_mean( tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)

return samples, acceptance_probs

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#issuecomment-1284214240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG2GMDHVEJCGYTWXVIQRTWEAI4RANCNFSM6AAAAAAQVZDV6E . You are receiving this because you commented.Message ID: @.***>

MariaLavrovskaya commented 1 year ago

Thank you! It works with the smaller step size and Sigmoid bijector.

I have another question: I was playing around with reinterpreted_batch_ndims and I do not seem to understand the difference when I set this value to different parameters. For example, if I sample once, each uniform produces 1 value. Then the mixture constructs three gaussian mixtures with roughly 20000 values each. The idea is that I want to find parameters for the mixture of gaussians that would be 'optimal' across all of 3 datasets each of which has results from different time period. When I set reinterpreted_batch_ndims=1, I have batch_shape = [1] and event_shape = [3,20000], whereas when I set reinterpreted_batch_ndims=2 I have batch_shape = [] and event_shape = [1,3,20000]. Both work when sample and log_prob is called. Values in log_prob would differ for different reinterpreted_batch_ndims. However, the sampler only seems to work when reinterpreted_batch_ndims=2 and throws an error when reinterpreted_batch_ndims=1.

Would you mind explaining what is the right approach here and why this behaviour is intended please?

MariaLavrovskaya commented 1 year ago

Found the answer to my question in the walk-through on shapes – thanks for help!

csuter commented 1 year ago

I don't think concentration is supposed to be negative?

On Mon, Oct 24, 2022 at 11:55 Maria @.***> wrote:

Hi again! Apologies for asking too many questions. I want to use generalised Pareto to fit the tail, however for some reason log_prob is -inf for this model. I am not sure why though, because it does work outside the Joint distribution. Would you mind giving me a hint?

return tfd.JointDistributionSequential([ tfd.Uniform(low = 0.01, high = 0.02), #loc tfd.Uniform(low = 0.7, high = 0.75), #scale tfd.Uniform(low = -0.2, high = -0.35), #concentration lambda conc, sc, loc: tfd.Independent(tfd.Sample( tfd.GeneralizedPareto(loc = affine(loc), scale = affine(sc), concentration = affine(conc), validate_args=True,allow_nan_stats=False), sample_shape = 1277),reinterpreted_batch_ndims=2)

])

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#issuecomment-1289247470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG2GOGMWZ3SMKUGKVRC5LWE2WQDANCNFSM6AAAAAAQVZDV6E . You are receiving this because you commented.Message ID: @.***>

MariaLavrovskaya commented 1 year ago

I managed to find my mistake. Concentration can be negative, however all values in x are bounded from above by loc-scale/concentration, hence the error.

MariaLavrovskaya commented 1 year ago

Hello! Is there an alternative in TFP to the posterior predictive samples from a model given a trace as in https://docs.pymc.io/en/v3/api/inference.html ?

csuter commented 1 year ago

The sample function on JointDistriubtions accepts arguments that let you "pin" certain values during sampling. you can use this to get posterior predictive samples, given posterior samples. The way we generate docs makes it a bit hard to suss out, but it's expressed most clearly (perhaps still not all that clearly...) in the sample_distributions docstring here. The same applies to the sample method, even though the docstrings don't say so (they're inherited with modifications from the base Distribution object...😬)

In short you can do things like this:

jds = tfd.JointDistributionSequential([
  tfd.Normal(0., 1., name='x')
  lambda x: tfd.Normal(x, 1., name = 'y')
])

# (1) pin value of x, but not y:
jds.sample(10, value=[4, None])

# (2) pin value of y, but not x:
jds.sample(10, value=[None, -3])

# (3) same as (1), but use kwargs to pin, based on the `name` parameters of the constituent distributions:
jds.sample(10, x=4)

# (4) same as (2), but use kwargs to pin
jds.sample(10, y=-3)

constructing a valid value argument can be a bit of a hassle (have to plug in Nones in just the right places), hence the latter form.

MariaLavrovskaya commented 1 year ago

I see, many thanks!