Plotlib.py Rework - Githubissues

jjobel commented 4 years ago

Compared to the master branch, we need a plotlib.py that works for clustr.py.
Currently, plotlib.py

Includes global variables
No Classes
clustr.py in rewrite does not
- import plotlib
- Make a plot using plotlib

sweverett commented 4 years ago

I'm actually not so concerned with the second point. Turning all of the plotting functionality into classes is really overkill for what we're doing here (and most of matplotlib is oo anyway). Instead, I'd recommend re-designing the existing plotting functions in plotlib.py to correctly handle inputs that are classes; e.g. passing a data object instead of a list of things like x, y, x_err, y_err, piv, etc.

However the first point is important - we definitely want to get rid of any global variables and pass these things through a config object instead.

jjobel commented 4 years ago

Just finished implementing the above suggestions. plotlib.py now successfully works with clustr.py by plotting a scatter plot and saves the pdf to folder. Preview below of output file.

Note) Hmmm not sure why image is not displaying. Perhaps file is too large?

Scatter-joser2500_band_lumin-lambda.pdf

The axis seem fine along with intercept, slope, and sigma values comparable to the ones outputted by clustr.py in the master branch.

I've only tested the scatter plot for now but will test the other plots tomorrow or even tonight if I don't fall asleep early.

jjobel commented 4 years ago

@sweverett Do you happen to recall how to include the confidence interval about the regression line, similar to the plot below? Also, how to extend the line?

jose_lr2500_thesis

The image below is the scatter plot clustr.py outputs.

Scatter-test11r2500_band_lumin-lambda-1

sweverett commented 4 years ago

So for an unknown reason the plotting settings in this code seem to depend on the user's hardware. We can get around this by setting the plotting settings explicitly in plotlib.py. There are some good examples of how to do this here if we want to use seaborn, for example.

I don't think the confidence interval on that plot is completely right, as confidence should be highest near the pivot and lower on the ends. But that was made using fill_between() with the bounds being +/- 1 sigma (where sigma here is the fitted intrinsic scatter from linmix)

jjobel commented 3 years ago

@sweverett Could you check plotlib.py to see if the changes I made are what you had in mind. It was regarding the number of objects we were passing to each plotting function and how each had to rerun the object's methods before unpacking.

What I did was place all the scaling stuff into the Fitter class in clustr.py and pass an instance of Fitter to everyone in plotlib.py. Now every plotting function just has to access the Fitter attributes to get what they need.

i.e,

def scatter_plot(fitter):
    xlog, ylog = fitter.log_x, fitter.log_y

    return

@paigemkelly

sweverett commented 3 years ago

Ideally I think we would have a separate Fit class instead of saving everything in the Fitter (as these are really distinct things) and then pass just a Fit object the plotting functions. However, I imagine you did it this way as you still need access to the scale_fit_to_data() function in plotlib.py which is an instance method of Fitter since it needs it to setup the fit. It's not completely obvious what the right decision is here, so I think it's fine to leave what you've done for now and I can come back to it later when I have more time to think about it.

sweverett commented 3 years ago

The most important things were to only have a single definition of scale_fit_to_data() and to not call the linmix fit multiple times, which you have done here.

sweverett / CluStR

Plotlib.py Rework #40