Closed paigemkelly closed 4 years ago
I believe it's only in reglib.py
, but it includes all the rpy2
imports, lines 9-16, as well as what's done in the run_lrgs()
function. So we actually don't need anything in lines 1-75 except for the numpy
and linmix
imports.
okay great, we just put that straight into the clustr.py code. We are having trouble figuring out the run_mcmc part of that function and the .chain part. I haven't seen that before and I'm having trouble relating what I'm finding on the internet to what's actually going on in the code. From what I'm understanding run_mcmc runs linmix through Nmin = 5000 to Nmax = 10000. What do Nmin and Nmax represent and how does that apply to linmix? or should we not worry about it?
I explained this a bit in #33. In principle we would try out every possible combination of values for (slope, intercept, intrinsic_scatter) and use the combination that gives the 'best fit' to the data. But this is quickly a computational nightmare as the time it takes to check all combination grows as the number of grid points ^ # of parameters. So instead there are smarter ways of sampling the parameter space of (slope, intercept, intrinsic_scatter), one of which is a Monte-Carlo Markov Chain (MCMC). It does a random walk in parameter space in a weighted way that will lead to it sampling better fit values than bad fit values. We can talk about this more in a zoom meeting if you'd like.
But it doesn't give a very reliable estimate if you only sample a few points. So you want to run until it 'converges', which is actually a bit of a complicated subject. Setting a minimum number of samples makes sure it has found the "correct" part of parameter space to sample from and the max number of samples will force it to quit if it's struggling to do the fit after a long time.
I think that makes sense, but just to clarify it will run until at least 5000 samples (Nmin) to make sure its reliable, and it will quit after 10000 (Nmax) if it's struggling? A zoom meeting would be helpful. I'll send an email, thanks!
Yes, that's correct. Those values aren't necessarily reliable however - it's always best to inspect the resulting chains by eye. We will do that eventually.
where is R being used in the code? is it only in reglib.py lines 9-16?