Add docs on justifying instruments in the IV approach

NathanielF commented 3 months ago

Just a draft PR for the moment.

Pretty happy with the example. Need to add some more write up and discuss if we want to add JAX/Numpyro as a dependency to the package.

review-notebook-app[bot] commented 3 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

NathanielF commented 3 months ago

For discussion @drbenvincent , @juanitorduz how do you think we should handle the Jax/Numpyro install. I've modified the IV class here to sample the PPC using the experimental JAX flag @jessegrabowski recomended, but i'm unsure if you want to make this a default.

drbenvincent commented 3 months ago

Just a quick comment based on an initial look on my phone... This seems to remove sampling from the prior altogether?

NathanielF commented 3 months ago

It did, but need not stay that way... Jessie's trick only seemed to work for posterior sampling, so prior sampling of mvNormal remains slow on prior checks.

However, I could keep the prior sampling but only sample the beta parameters which is fast.

Could also just default to only sample for the IV class and show how to do prior and posterior checks in the notebook after the original fit...

Options sort of depend on what you want to do with Jax and how integral it should be to CausalPy installation?

jessegrabowski commented 3 months ago

You can do it for prior as well, but you have to freeze the model before you sample, as in:

with pm.Model() as m:
    ...

from pymc.model.transform.optimization import freeze_dims_and_data

with freeze_dims_and_data(m):
    prior = pm.sample_prior_predictive(compile_kwargs={'mode':'JAX'})

I am going to open an issue for JAX forward sampling support, because it's really nice and all these hoops are silly.

drbenvincent commented 3 months ago

Failing remote tests will be fixed when #346 is merged

drbenvincent commented 3 months ago

Options sort of depend on what you want to do with Jax and how integral it should be to CausalPy installation?

What would be the main con's of adding that as a dependency?

NathanielF commented 3 months ago

I guess it's just heavier and @jessegrabowski 's trick to speed up the mvNormal ppc, just seems hacky and maybe things change down the road...

On the plus side you enable numpyro sampling which is great.

I think we should enable it but not bake the dependency into the model fit step. Instead, code defensively. Keep the IV fit method light but demonstrate fast ppc usage in the notebook docs....

drbenvincent commented 3 months ago

The preference is to stay light, but if we get nifty new functionality then I don't see a fundamental problem with adding dependencies.

So in your proposal you'd use a cell magic which conda installs numpyro so that it's just there locally. That should work.

It's also worth remembering that in the tests/doctests the IV ones are the slowest. If they can be sped up then that would be great. But that probably would require adding dependencies?

drbenvincent commented 3 months ago

PS if you update from main all the checks should pass now

NathanielF commented 3 months ago

Trying to play with the freeze dim approach for prior predictive sampling and getting a JAX not implmented error:

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.96%. Comparing base (23cbfd1) to head (9eb41f1). Report is 2 commits behind head on main.

Files	Patch %	Lines
causalpy/pymc_models.py	83.33%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #345 +/- ## ========================================== - Coverage 79.98% 79.96% -0.03% ========================================== Files 21 21 Lines 1634 1642 +8 ========================================== + Hits 1307 1313 +6 - Misses 327 329 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

NathanielF commented 3 months ago

Think I'm in a bit of a catch 22 re: failing tests. Added new test fails in github action because we require Jax installed in the deployment environment. Remove thar function call and test coverage fails...

NathanielF commented 3 months ago

Added further discussion on why IV regression is particularly well-suited to bayesian inference methods

drbenvincent commented 3 months ago

@NathanielF Just checking about the failing remote tests. Seems pretty clear that it's failing because of the new jax dependency. Not sure I explicitly thought about it before, but I assumed the remote tests would be based on the code in the PR, so should therefore pass? But it looks like it's not - so any PR that adds a new dependency will fail remote checks until it's merged? That seems a bit odd?

So just checking on this. Do you know either way, and do the tests pass for you locally?

NathanielF commented 3 months ago

@NathanielF Just checking about the failing remote tests. Seems pretty clear that it's failing because of the new jax dependency. Not sure I explicitly thought about it before, but I assumed the remote tests would be based on the code in the PR, so should therefore pass? But it looks like it's not - so any PR that adds a new dependency will fail remote checks until it's merged? That seems a bit odd?

So just checking on this. Do you know either way, and do the tests pass for you locally?

Yeah, I thought this strange too. Wasn't sure how to handle it.

NathanielF commented 3 months ago

Before I do a proper review, can we try to rasterize many of the plots in the notebook. 6.1 MB is a bit heavy.

See info here https://matplotlib.org/stable/gallery/misc/rasterization_demo.html

It should just come down to setting a kwarg rasterized=True

Didn't know you could do that. Will adapt.

NathanielF commented 3 months ago

Locally I re-build environment, ran tests, and get a failing test
FAILED causalpy/tests/test_integration_pymc_examples.py::test_iv_reg - TypeError: No model on context stack.

Odd. I think it worked for me. Will rebuild environment and try myself.

NathanielF commented 3 months ago

Tests passing now.

drbenvincent commented 3 months ago

Tests pass locally for me too. Still a bit confused by failing remote test.

NathanielF commented 3 months ago

Tests pass locally for me too. Still a bit confused by failing remote test.

I'd guess it's because whatever runner is being spun up to enact the github tests is being spun up based on the main branch...?

drbenvincent commented 3 months ago

Summary of discussion outside of GitHub... Remote test environment is currently pip installed with run: pip install -e .[test] in ci.yml, so it's looking at the pyproject.toml not the environment.yml

NathanielF commented 3 months ago

Yeah I kind of agree with @maresb that the jax install is heavy and a lot to add as a default. I tried to extract the jax calls here to a separate optional function, but when I removed that function call from the test, the code coverage fails... so i was caught in a catch 22.

NathanielF commented 3 months ago

Ideal world we'd keep the optional function call available as I have it. But up the code coverage elsewhere... so Jax doesn't need to be invoked when running the tests.

NathanielF commented 3 months ago

WOOOP! Works now. Removed Jax dependency and have a function call to the ppc sampler that is optional for the IV class @drbenvincent

drbenvincent commented 3 months ago

I'm more convinced now that the pre-commit checks are not applying ruff to notebooks (see https://github.com/pymc-labs/CausalPy/issues/340). I'll see if I (or someone) can get that issue sorted

NathanielF commented 3 months ago

Seemed to work on pre-commit for me. At least it stopped me a few times when e.g. I had an import statement not in the first cell

On Thu 13 Jun 2024, 21:02 Benjamin T. Vincent, @.***> wrote:

I'm more convinced now that the pre-commit checks are not applying ruff to notebooks (see #340 https://github.com/pymc-labs/CausalPy/issues/340). I'll see if I (or someone) can get that issue sorted

— Reply to this email directly, view it on GitHub https://github.com/pymc-labs/CausalPy/pull/345#issuecomment-2166667985, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQZAB43KNXFVPPF7VNV5N3ZHH3FXAVCNFSM6AAAAABIXJB7F2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGY3DOOJYGU . You are receiving this because you were mentioned.Message ID: @.***>

drbenvincent commented 3 months ago

If you could review/approve https://github.com/pymc-labs/CausalPy/pull/352, once it's merged then you can update from main and it should fix up the minor formatting issues in the notebook.

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:55Z ----------------------------------------------------------------

Can you add an admonition box with a bit of explanation that we need to install these extra dependencies

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:56Z ----------------------------------------------------------------

Repetition of "determine" in first sentence

propose something like "returns to schooling." -> "economic (or other?) returns FROM schooling" Or do you mean a return to schooling for older students?

Credibility revolution? As in the replication crisis? Could be good to disambiguate / elaborate

NathanielF commented on 2024-06-17T16:25:41Z ----------------------------------------------------------------

Thanks adjusted. Added link to wikipedia article on the credibility revolution. It's useful language to have.

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:56Z ----------------------------------------------------------------

Propose the title has LATE acronym at the end, rather than embedded. Just looks less cluttered.

Can you add a glossary term for LATE? And a glossary link to ATE (glossary entry already exists)

NathanielF commented on 2024-06-17T16:26:00Z ----------------------------------------------------------------

Removed LATE from title and added glossary link

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:57Z ----------------------------------------------------------------

propose adding hide-input cell tag

I think there's definitely scope to have more explanation / hand holding of what is going on in the plot - what are the key things the reader should be looking at?

NathanielF commented on 2024-06-17T16:26:23Z ----------------------------------------------------------------

Hid code and expanded explanation

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:58Z ----------------------------------------------------------------

Propose adding hide-input cell tag because the pandas wrangling is not conceptually key here

NathanielF commented on 2024-06-17T16:26:33Z ----------------------------------------------------------------

Hid code

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:59Z ----------------------------------------------------------------

Propose adding hide-input cell tag for these large blocks of plot generation code

NathanielF commented on 2024-06-17T16:26:52Z ----------------------------------------------------------------

Hid code and expanded explanation

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:23:59Z ----------------------------------------------------------------

Similar - could do with breaking down this plot a bit more. What do the ellipses highlight? What is it which indicates a large effect of near/non-near college?

NathanielF commented on 2024-06-17T16:27:01Z ----------------------------------------------------------------

Done.

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:00Z ----------------------------------------------------------------

"We show how TO perform this exercise below"

Under "Justificatory Models" it would be good to flag up the structure of things to come. Eg. Primer the reader to understand what "The first stage" "The Reduced Form" "The Wald Estimate" and "Wald Estimate Versus Simple Controlling Regression" are about. What are all these things and what are their inter-relation? Would you only do one of these things, or all?

Double check correctness of heading levels

NathanielF commented on 2024-06-17T16:27:37Z ----------------------------------------------------------------

Added more signposting

NathanielF commented on 2024-06-17T16:27:46Z ----------------------------------------------------------------

Checked headers

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:01Z ----------------------------------------------------------------

Explicitly mention beta_z[nearcollege_indicator] ??

NathanielF commented on 2024-06-17T16:27:57Z ----------------------------------------------------------------

Yep. Added

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:02Z ----------------------------------------------------------------

It's clear, but just in case a reader has not come across these plots before, you could briefly explicitly state that the posterior is "meaningfully" updated by the data because it's very different from the prior. Or something along those lines

NathanielF commented on 2024-06-17T16:28:14Z ----------------------------------------------------------------

Added a one-liner

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:03Z ----------------------------------------------------------------

Maybe just display this as a raw value (e.g. .data) because there's a lot of not very useful looking array output

NathanielF commented on 2024-06-17T16:28:26Z ----------------------------------------------------------------

yep, removed.

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:03Z ----------------------------------------------------------------

Similarly, be explicit about which coefficient we are looking at.

NathanielF commented on 2024-06-17T16:28:35Z ----------------------------------------------------------------

added!

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:04Z ----------------------------------------------------------------

Propose adding hide-input cell tag

NathanielF commented on 2024-06-17T16:28:48Z ----------------------------------------------------------------

Hid code

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:05Z ----------------------------------------------------------------

typo: "strenghtening"

Could be worth generating a DAG of this? Might help readers understand the relation between the formula's and the DAG

NathanielF commented on 2024-06-17T16:30:24Z ----------------------------------------------------------------

I wasn't sure about changing the DAG here. I'm not sure it adds more as it's still the NEAR variable just adding more species of college... I think it's fine without a specific DAG.

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:06Z ----------------------------------------------------------------

Again, just be explicit about which coefficient the reader should look at

NathanielF commented on 2024-06-17T16:30:41Z ----------------------------------------------------------------

Added in the detail

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:07Z ----------------------------------------------------------------

Add some take home message for these coefficient results

NathanielF commented on 2024-06-17T16:30:58Z ----------------------------------------------------------------

Adjusted

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:07Z ----------------------------------------------------------------

Propose adding hide-input cell tag

x-label text particularly small here

NathanielF commented on 2024-06-17T16:31:08Z ----------------------------------------------------------------

Updated

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-06-17T11:24:08Z ----------------------------------------------------------------

Consider adding semi-opaque background to the legend box because in some of these the overlap with the data makes the legend hard to read

In the upper panels, clarify what the grey vs the light blue distributions are. Maybe add to legend?

NathanielF commented 3 months ago

Thanks adjusted. Added link to wikipedia article on the credibility revolution. It's useful language to have.

View entire conversation on ReviewNB

NathanielF commented 3 months ago

Removed LATE from title and added glossary link

View entire conversation on ReviewNB

NathanielF commented 3 months ago

Hid code and expanded explanation

View entire conversation on ReviewNB

NathanielF commented 3 months ago

Hid code

View entire conversation on ReviewNB

pymc-labs / CausalPy

Add docs on justifying instruments in the IV approach #345

Codecov Report