pymc-labs / CausalPy

A Python package for causal inference in quasi-experimental settings
https://causalpy.readthedocs.io
Apache License 2.0
880 stars 63 forks source link

Add docs on justifying instruments in the IV approach #345

Closed NathanielF closed 3 months ago

NathanielF commented 3 months ago

Just a draft PR for the moment.

Pretty happy with the example. Need to add some more write up and discuss if we want to add JAX/Numpyro as a dependency to the package.

review-notebook-app[bot] commented 3 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

NathanielF commented 3 months ago

For discussion @drbenvincent , @juanitorduz how do you think we should handle the Jax/Numpyro install. I've modified the IV class here to sample the PPC using the experimental JAX flag @jessegrabowski recomended, but i'm unsure if you want to make this a default.

drbenvincent commented 3 months ago

Just a quick comment based on an initial look on my phone... This seems to remove sampling from the prior altogether?

NathanielF commented 3 months ago

It did, but need not stay that way... Jessie's trick only seemed to work for posterior sampling, so prior sampling of mvNormal remains slow on prior checks.

However, I could keep the prior sampling but only sample the beta parameters which is fast.

Could also just default to only sample for the IV class and show how to do prior and posterior checks in the notebook after the original fit...

Options sort of depend on what you want to do with Jax and how integral it should be to CausalPy installation?

jessegrabowski commented 3 months ago

You can do it for prior as well, but you have to freeze the model before you sample, as in:

with pm.Model() as m:
    ...

from pymc.model.transform.optimization import freeze_dims_and_data

with freeze_dims_and_data(m):
    prior = pm.sample_prior_predictive(compile_kwargs={'mode':'JAX'})

I am going to open an issue for JAX forward sampling support, because it's really nice and all these hoops are silly.

drbenvincent commented 3 months ago

Failing remote tests will be fixed when #346 is merged

drbenvincent commented 3 months ago

Options sort of depend on what you want to do with Jax and how integral it should be to CausalPy installation?

What would be the main con's of adding that as a dependency?

NathanielF commented 3 months ago

I guess it's just heavier and @jessegrabowski 's trick to speed up the mvNormal ppc, just seems hacky and maybe things change down the road...

On the plus side you enable numpyro sampling which is great.

I think we should enable it but not bake the dependency into the model fit step. Instead, code defensively. Keep the IV fit method light but demonstrate fast ppc usage in the notebook docs....

drbenvincent commented 3 months ago

The preference is to stay light, but if we get nifty new functionality then I don't see a fundamental problem with adding dependencies.

So in your proposal you'd use a cell magic which conda installs numpyro so that it's just there locally. That should work.

It's also worth remembering that in the tests/doctests the IV ones are the slowest. If they can be sped up then that would be great. But that probably would require adding dependencies?

drbenvincent commented 3 months ago

PS if you update from main all the checks should pass now

NathanielF commented 3 months ago

Trying to play with the freeze dim approach for prior predictive sampling and getting a JAX not implmented error:

image
codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.96%. Comparing base (23cbfd1) to head (9eb41f1). Report is 2 commits behind head on main.

Files Patch % Lines
causalpy/pymc_models.py 83.33% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #345 +/- ## ========================================== - Coverage 79.98% 79.96% -0.03% ========================================== Files 21 21 Lines 1634 1642 +8 ========================================== + Hits 1307 1313 +6 - Misses 327 329 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

NathanielF commented 3 months ago

Think I'm in a bit of a catch 22 re: failing tests. Added new test fails in github action because we require Jax installed in the deployment environment. Remove thar function call and test coverage fails...

NathanielF commented 3 months ago

Added further discussion on why IV regression is particularly well-suited to bayesian inference methods

drbenvincent commented 3 months ago

@NathanielF Just checking about the failing remote tests. Seems pretty clear that it's failing because of the new jax dependency. Not sure I explicitly thought about it before, but I assumed the remote tests would be based on the code in the PR, so should therefore pass? But it looks like it's not - so any PR that adds a new dependency will fail remote checks until it's merged? That seems a bit odd?

So just checking on this. Do you know either way, and do the tests pass for you locally?

NathanielF commented 3 months ago

@NathanielF Just checking about the failing remote tests. Seems pretty clear that it's failing because of the new jax dependency. Not sure I explicitly thought about it before, but I assumed the remote tests would be based on the code in the PR, so should therefore pass? But it looks like it's not - so any PR that adds a new dependency will fail remote checks until it's merged? That seems a bit odd?

So just checking on this. Do you know either way, and do the tests pass for you locally?

Yeah, I thought this strange too. Wasn't sure how to handle it.

NathanielF commented 3 months ago

Before I do a proper review, can we try to rasterize many of the plots in the notebook. 6.1 MB is a bit heavy.

See info here https://matplotlib.org/stable/gallery/misc/rasterization_demo.html

It should just come down to setting a kwarg rasterized=True

Didn't know you could do that. Will adapt.

NathanielF commented 3 months ago

Locally I re-build environment, ran tests, and get a failing test

FAILED causalpy/tests/test_integration_pymc_examples.py::test_iv_reg - TypeError: No model on context stack.

Odd. I think it worked for me. Will rebuild environment and try myself.

NathanielF commented 3 months ago
image

Tests passing now.

drbenvincent commented 3 months ago

Tests pass locally for me too. Still a bit confused by failing remote test.

NathanielF commented 3 months ago

Tests pass locally for me too. Still a bit confused by failing remote test.

I'd guess it's because whatever runner is being spun up to enact the github tests is being spun up based on the main branch...?

drbenvincent commented 3 months ago

Summary of discussion outside of GitHub... Remote test environment is currently pip installed with run: pip install -e .[test] in ci.yml, so it's looking at the pyproject.toml not the environment.yml

NathanielF commented 3 months ago

Yeah I kind of agree with @maresb that the jax install is heavy and a lot to add as a default. I tried to extract the jax calls here to a separate optional function, but when I removed that function call from the test, the code coverage fails... so i was caught in a catch 22.

NathanielF commented 3 months ago

Ideal world we'd keep the optional function call available as I have it. But up the code coverage elsewhere... so Jax doesn't need to be invoked when running the tests.

NathanielF commented 3 months ago

WOOOP! Works now. Removed Jax dependency and have a function call to the ppc sampler that is optional for the IV class @drbenvincent

drbenvincent commented 3 months ago

I'm more convinced now that the pre-commit checks are not applying ruff to notebooks (see https://github.com/pymc-labs/CausalPy/issues/340). I'll see if I (or someone) can get that issue sorted

NathanielF commented 3 months ago

Seemed to work on pre-commit for me. At least it stopped me a few times when e.g. I had an import statement not in the first cell

On Thu 13 Jun 2024, 21:02 Benjamin T. Vincent, @.***> wrote:

I'm more convinced now that the pre-commit checks are not applying ruff to notebooks (see #340 https://github.com/pymc-labs/CausalPy/issues/340). I'll see if I (or someone) can get that issue sorted

— Reply to this email directly, view it on GitHub https://github.com/pymc-labs/CausalPy/pull/345#issuecomment-2166667985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQZAB43KNXFVPPF7VNV5N3ZHH3FXAVCNFSM6AAAAABIXJB7F2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGY3DOOJYGU . You are receiving this because you were mentioned.Message ID: @.***>

drbenvincent commented 3 months ago

If you could review/approve https://github.com/pymc-labs/CausalPy/pull/352, once it's merged then you can update from main and it should fix up the minor formatting issues in the notebook.

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:55Z ----------------------------------------------------------------

Can you add an admonition box with a bit of explanation that we need to install these extra dependencies


review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:56Z ----------------------------------------------------------------

Repetition of "determine" in first sentence

propose something like "returns to schooling." -> "economic (or other?) returns FROM schooling" Or do you mean a return to schooling for older students?

Credibility revolution? As in the replication crisis? Could be good to disambiguate / elaborate


NathanielF commented on 2024-06-17T16:25:41Z ----------------------------------------------------------------

Thanks adjusted. Added link to wikipedia article on the credibility revolution. It's useful language to have.