Closed NathanielF closed 3 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
For discussion @drbenvincent , @juanitorduz how do you think we should handle the Jax/Numpyro install. I've modified the IV class here to sample the PPC using the experimental JAX flag @jessegrabowski recomended, but i'm unsure if you want to make this a default.
Just a quick comment based on an initial look on my phone... This seems to remove sampling from the prior altogether?
It did, but need not stay that way... Jessie's trick only seemed to work for posterior sampling, so prior sampling of mvNormal remains slow on prior checks.
However, I could keep the prior sampling but only sample the beta parameters which is fast.
Could also just default to only sample for the IV class and show how to do prior and posterior checks in the notebook after the original fit...
Options sort of depend on what you want to do with Jax and how integral it should be to CausalPy installation?
You can do it for prior as well, but you have to freeze the model before you sample, as in:
with pm.Model() as m:
...
from pymc.model.transform.optimization import freeze_dims_and_data
with freeze_dims_and_data(m):
prior = pm.sample_prior_predictive(compile_kwargs={'mode':'JAX'})
I am going to open an issue for JAX forward sampling support, because it's really nice and all these hoops are silly.
Failing remote tests will be fixed when #346 is merged
Options sort of depend on what you want to do with Jax and how integral it should be to CausalPy installation?
What would be the main con's of adding that as a dependency?
I guess it's just heavier and @jessegrabowski 's trick to speed up the mvNormal ppc, just seems hacky and maybe things change down the road...
On the plus side you enable numpyro sampling which is great.
I think we should enable it but not bake the dependency into the model fit step. Instead, code defensively. Keep the IV fit method light but demonstrate fast ppc usage in the notebook docs....
The preference is to stay light, but if we get nifty new functionality then I don't see a fundamental problem with adding dependencies.
So in your proposal you'd use a cell magic which conda installs numpyro so that it's just there locally. That should work.
It's also worth remembering that in the tests/doctests the IV ones are the slowest. If they can be sped up then that would be great. But that probably would require adding dependencies?
PS if you update from main
all the checks should pass now
Trying to play with the freeze dim approach for prior predictive sampling and getting a JAX not implmented error:
Attention: Patch coverage is 84.61538%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 79.96%. Comparing base (
23cbfd1
) to head (9eb41f1
). Report is 2 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
causalpy/pymc_models.py | 83.33% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Think I'm in a bit of a catch 22 re: failing tests. Added new test fails in github action because we require Jax installed in the deployment environment. Remove thar function call and test coverage fails...
Added further discussion on why IV regression is particularly well-suited to bayesian inference methods
@NathanielF Just checking about the failing remote tests. Seems pretty clear that it's failing because of the new jax dependency. Not sure I explicitly thought about it before, but I assumed the remote tests would be based on the code in the PR, so should therefore pass? But it looks like it's not - so any PR that adds a new dependency will fail remote checks until it's merged? That seems a bit odd?
So just checking on this. Do you know either way, and do the tests pass for you locally?
@NathanielF Just checking about the failing remote tests. Seems pretty clear that it's failing because of the new jax dependency. Not sure I explicitly thought about it before, but I assumed the remote tests would be based on the code in the PR, so should therefore pass? But it looks like it's not - so any PR that adds a new dependency will fail remote checks until it's merged? That seems a bit odd?
So just checking on this. Do you know either way, and do the tests pass for you locally?
Yeah, I thought this strange too. Wasn't sure how to handle it.
Before I do a proper review, can we try to rasterize many of the plots in the notebook. 6.1 MB is a bit heavy.
See info here https://matplotlib.org/stable/gallery/misc/rasterization_demo.html
It should just come down to setting a kwarg
rasterized=True
Didn't know you could do that. Will adapt.
Locally I re-build environment, ran tests, and get a failing test
FAILED causalpy/tests/test_integration_pymc_examples.py::test_iv_reg - TypeError: No model on context stack.
Odd. I think it worked for me. Will rebuild environment and try myself.
Tests passing now.
Tests pass locally for me too. Still a bit confused by failing remote test.
Tests pass locally for me too. Still a bit confused by failing remote test.
I'd guess it's because whatever runner is being spun up to enact the github tests is being spun up based on the main branch...?
Summary of discussion outside of GitHub... Remote test environment is currently pip installed with run: pip install -e .[test]
in ci.yml
, so it's looking at the pyproject.toml
not the environment.yml
Yeah I kind of agree with @maresb that the jax install is heavy and a lot to add as a default. I tried to extract the jax calls here to a separate optional function, but when I removed that function call from the test, the code coverage fails... so i was caught in a catch 22.
Ideal world we'd keep the optional function call available as I have it. But up the code coverage elsewhere... so Jax doesn't need to be invoked when running the tests.
WOOOP! Works now. Removed Jax dependency and have a function call to the ppc sampler that is optional for the IV class @drbenvincent
I'm more convinced now that the pre-commit checks are not applying ruff to notebooks (see https://github.com/pymc-labs/CausalPy/issues/340). I'll see if I (or someone) can get that issue sorted
Seemed to work on pre-commit for me. At least it stopped me a few times when e.g. I had an import statement not in the first cell
On Thu 13 Jun 2024, 21:02 Benjamin T. Vincent, @.***> wrote:
I'm more convinced now that the pre-commit checks are not applying ruff to notebooks (see #340 https://github.com/pymc-labs/CausalPy/issues/340). I'll see if I (or someone) can get that issue sorted
— Reply to this email directly, view it on GitHub https://github.com/pymc-labs/CausalPy/pull/345#issuecomment-2166667985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQZAB43KNXFVPPF7VNV5N3ZHH3FXAVCNFSM6AAAAABIXJB7F2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGY3DOOJYGU . You are receiving this because you were mentioned.Message ID: @.***>
If you could review/approve https://github.com/pymc-labs/CausalPy/pull/352, once it's merged then you can update from main and it should fix up the minor formatting issues in the notebook.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:55Z ----------------------------------------------------------------
Can you add an admonition box with a bit of explanation that we need to install these extra dependencies
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:56Z ----------------------------------------------------------------
Repetition of "determine" in first sentence
propose something like "returns to schooling." -> "economic (or other?) returns FROM schooling" Or do you mean a return to schooling for older students?
Credibility revolution? As in the replication crisis? Could be good to disambiguate / elaborate
NathanielF commented on 2024-06-17T16:25:41Z ----------------------------------------------------------------
Thanks adjusted. Added link to wikipedia article on the credibility revolution. It's useful language to have.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:56Z ----------------------------------------------------------------
Propose the title has LATE acronym at the end, rather than embedded. Just looks less cluttered.
Can you add a glossary term for LATE? And a glossary link to ATE (glossary entry already exists)
NathanielF commented on 2024-06-17T16:26:00Z ----------------------------------------------------------------
Removed LATE from title and added glossary link
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:57Z ----------------------------------------------------------------
propose adding hide-input
cell tag
I think there's definitely scope to have more explanation / hand holding of what is going on in the plot - what are the key things the reader should be looking at?
NathanielF commented on 2024-06-17T16:26:23Z ----------------------------------------------------------------
Hid code and expanded explanation
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:58Z ----------------------------------------------------------------
Propose adding hide-input
cell tag because the pandas wrangling is not conceptually key here
NathanielF commented on 2024-06-17T16:26:33Z ----------------------------------------------------------------
Hid code
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:59Z ----------------------------------------------------------------
Propose adding hide-input
cell tag for these large blocks of plot generation code
NathanielF commented on 2024-06-17T16:26:52Z ----------------------------------------------------------------
Hid code and expanded explanation
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:23:59Z ----------------------------------------------------------------
Similar - could do with breaking down this plot a bit more. What do the ellipses highlight? What is it which indicates a large effect of near/non-near college?
NathanielF commented on 2024-06-17T16:27:01Z ----------------------------------------------------------------
Done.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:00Z ----------------------------------------------------------------
"We show how TO perform this exercise below"
Under "Justificatory Models" it would be good to flag up the structure of things to come. Eg. Primer the reader to understand what "The first stage" "The Reduced Form" "The Wald Estimate" and "Wald Estimate Versus Simple Controlling Regression" are about. What are all these things and what are their inter-relation? Would you only do one of these things, or all?
Double check correctness of heading levels
NathanielF commented on 2024-06-17T16:27:37Z ----------------------------------------------------------------
Added more signposting
NathanielF commented on 2024-06-17T16:27:46Z ----------------------------------------------------------------
Checked headers
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:01Z ----------------------------------------------------------------
Explicitly mention beta_z[nearcollege_indicator]
??
NathanielF commented on 2024-06-17T16:27:57Z ----------------------------------------------------------------
Yep. Added
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:02Z ----------------------------------------------------------------
It's clear, but just in case a reader has not come across these plots before, you could briefly explicitly state that the posterior is "meaningfully" updated by the data because it's very different from the prior. Or something along those lines
NathanielF commented on 2024-06-17T16:28:14Z ----------------------------------------------------------------
Added a one-liner
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:03Z ----------------------------------------------------------------
Maybe just display this as a raw value (e.g. .data
) because there's a lot of not very useful looking array output
NathanielF commented on 2024-06-17T16:28:26Z ----------------------------------------------------------------
yep, removed.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:03Z ----------------------------------------------------------------
Similarly, be explicit about which coefficient we are looking at.
NathanielF commented on 2024-06-17T16:28:35Z ----------------------------------------------------------------
added!
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:04Z ----------------------------------------------------------------
Propose adding hide-input
cell tag
NathanielF commented on 2024-06-17T16:28:48Z ----------------------------------------------------------------
Hid code
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:05Z ----------------------------------------------------------------
typo: "strenghtening"
Could be worth generating a DAG of this? Might help readers understand the relation between the formula's and the DAG
NathanielF commented on 2024-06-17T16:30:24Z ----------------------------------------------------------------
I wasn't sure about changing the DAG here. I'm not sure it adds more as it's still the NEAR
variable just adding more species of college... I think it's fine without a specific DAG.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:06Z ----------------------------------------------------------------
Again, just be explicit about which coefficient the reader should look at
NathanielF commented on 2024-06-17T16:30:41Z ----------------------------------------------------------------
Added in the detail
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:07Z ----------------------------------------------------------------
Add some take home message for these coefficient results
NathanielF commented on 2024-06-17T16:30:58Z ----------------------------------------------------------------
Adjusted
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:07Z ----------------------------------------------------------------
Propose adding hide-input
cell tag
x-label text particularly small here
NathanielF commented on 2024-06-17T16:31:08Z ----------------------------------------------------------------
Updated
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-06-17T11:24:08Z ----------------------------------------------------------------
Consider adding semi-opaque background to the legend box because in some of these the overlap with the data makes the legend hard to read
In the upper panels, clarify what the grey vs the light blue distributions are. Maybe add to legend?
Thanks adjusted. Added link to wikipedia article on the credibility revolution. It's useful language to have.
View entire conversation on ReviewNB
Just a draft PR for the moment.
Pretty happy with the example. Need to add some more write up and discuss if we want to add JAX/Numpyro as a dependency to the package.