More efficient sampling setup

paulthebaker commented 5 years ago

In @Hazboun6's solar wind PR (#18) he says:

I have included a new jump proposal that will propose jumps from the prior of any "non-standard" signals. I had wanted to included a flag where one could supply the name of a signal, trying to avoid the proliferation of jump proposals, but PTMCMCSampler only takes a standard set of object inputs. If someone can think of a cleaner, or more specific way to do this please make suggestions. @paulthebaker or @svigeland Any thoughts on this?

Totally independently, I've been thinking about exactly this. My solution was a callable class, with class factories to generate specific instances. There are still a few snags. For instance I was working with the PTMCMCSampler param indexing (enterprise.PTA.params treats vector parameters as one), and sampling vector parameters is tricky. The you would instantiate the class by giving it a list of parameters to work on or a signal name.

The more I think about this, the more I think it's unnecessary. I don't think we need dedicated prior draws for every signal.

Some thoughts:

Specific signal prior draws are redundant, if there is a global prior draw that covers all parameters. The current prior draw selects an enterprise.PTA.params member to move, which doesn't visit all sampling parameters equally (since some may be parts of vectors). We should fix this.
Our proposal scheme weighting isn't great. Currently we set the global prior draw to have a low weight (5), then have many signal prior draws at mid weight each (10). Signals with only a few parameters end up getting jumped more often per-parameter. We'd be better off with a global prior on all parameters that we call more often.
Prior draws on LinearExp are very inefficient, and we may not want to do them at all. This is why we put the dedicated log-uniform draw for GWB amplitude. We should consider doing this for RN amplitude when using LinearExp for them too.
If a parameter needs to be moved more often, then throwing prior draws at it won't help, when the parameter's posterior is significantly different than the prior. Say you want to move GWB amplitude a lot, append 20 copies of a group of one index to groups. The AM and SCAM moves will hit that parameter more often.
We should start using empirical distributions from previous runs as standard practice. We should be making them for any noise parameters that we sample: RN, DMGP, DM annual, etc.
We always move parameters one at a time in our extra draws in enterprise_extensions. When we have high acceptance moves (like empirical draws) we should sometimes move multiple parameters simultaneously.

A few stray thoughts:

enterprise Parameters all have their own sampler method which draws from the prior. We shouldn't hard code ranges in prior draws, we should use the sampler. (see draw_from_dm1yr_prior doing it the wrong way)
The new Parameter doesn't carry around it's prior properties (e.g. pmin, pmax). Those are hiding in the methods to calculate logpdf and to sample, but they are not accessible to us. This complicates autogenerating log-uniform draws for LinearExp parameters.

paulthebaker commented 5 years ago

Here's a GWB amplitude chain from an IPTA cumulative UL run with 14 pulsars and BayesEphem:

The first half used enterprise_extensions.model_utils.setup_sampler(). The second half is setup manually.

groups: (all params), (pulsar RN, DMGP, DM annual) * each pulsar, (GWB Amp), (BayesEphem), (jup_orb & GWB Amp)
moves: prior draw (10), RN log-uni (5), GWB log-uni (5), BE prior (5), SCAM (30), AM (20), DE (50)

initial covariance stdev = (starting order of magnitude) / 10

stdev = np.array([10**np.floor(np.log10(abs(x)))/10 for x in x0])
cov = np.diag(stdev**2)

Hazboun6 commented 5 years ago

Nice work @paulthebaker ! I'm a bit confused about why the global draw is better overall. I can imagine that very early on it's useful, but (and this is an anecdotal memory) it seems like often many of our parameters burn in to their "usual" area of parameter space, but others are more autocorrelated or stuck. I would think that jump proposals in those parameter priors would be good. Are you saying that we just have the wrong weighting between them? Obviously the proof is in the pudding, in your test run above. I'm just curious about how one might know (say from looking at a chain) when to down weight various JPs in favor of others.

I really like your example because it is just tweaks to our already existent infrastructure. Is there as way to make enterprise_extensions.model_utils.setup_sampler() more tunable? Or should we have different standard set ups?

Also, to your second bullet point, the pta.params has a dictionary of enterprise.Parameter objects. Can one use those?

paulthebaker commented 5 years ago

I think we can both change the default behavior of setup_sampler() and make it more tunable.

If you look at the gwb_analysis.py and sample_utils.py scripts in the IPTA_DR2_analysis repo you'll see how I set up my prior draws and groups and stuff

Also, to your second bullet point, the pta.params has a dictionary of enterprise.Parameter objects. Can one use those?

The problem is that the new Parameter classes don't retain their initialization info (e.g. pmin). They only have methods to calculate PDF and sample.

nanograv / enterprise_extensions

More efficient sampling setup #19