HSGP: Birthdays Example

juanitorduz commented 8 months ago

Closes https://github.com/pymc-devs/pymc-examples/issues/626

Scope:

A simpler and detailed version of the model presented in https://juanitorduz.github.io/birthdays/ (@juanitorduz )
The complete model at the end (@bwengals ). Bill, you can use the code from my post to generate the features of the special dates.

TODO:

[x] Intro
[x] EDA
[x] Simple model
[ ] Complete model
[x] References
[x] Additional packages in metadata.

📚 Documentation preview 📚: https://pymc-examples--627.org.readthedocs.build/en/627/

review-notebook-app[bot] commented 8 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

juanitorduz commented 8 months ago

The simple model is ready for a first round of review.

AlexAndorra commented 8 months ago

Nice @juanitorduz !! So this is an extension of your original blog post? I was gonna read when I got the time, but I might read and review this NB if that's the same and helpful to you

juanitorduz commented 8 months ago

Well, we decided with Bill that we will present a simpler model to illustrate HSGP and then Bill will add at the end the complete model from Aki. I was not able to add the Horseshoe prior (very bad r hats and divergences) in my post so Bill will give it a go :)

All feedback is welcome!

AlexAndorra commented 8 months ago

Ok, so it sounds like it's your post updated + simpler model, so I'll definitely review this PR instead! Please feel free to ping me when it's ready for review

juanitorduz commented 8 months ago

As discussed with Bill, we want to keep the model specification closer to the Stsan one for this example. In https://github.com/pymc-devs/pymc-examples/pull/627/commits/08f4bc8ac19b700206c89737dc8fa27db81f0cb4 , I changed the day_of_week parametrization from zero-sum-normal to one-hot-encoding setting Monday coefficient to zero. The relative contribution between days (say difference between Tuesday and Sunday) remained the same, it is more about the interpretation.

Remark: Note this corresponds to Model 3: Slow trend + yearly seasonal trend + day of week. In particular, if you go to the Stan code we see there is actually not an intercept as we are modeling the standardized births (vector[N] intercept = 0.0 + f_day_of_week[day_of_week], see https://github.com/avehtari/casestudies/blob/master/Birthdays/gpbf3.stan#L47C3-L47C58)

juanitorduz commented 8 months ago

@bwengals I added a small comment on the first basis vectors in https://github.com/pymc-devs/pymc-examples/pull/627/commits/67ef3fc0b7847371f23cf9d05a1f9de25dccda2a . Let me know what you think and how much you think we should expand (related to https://github.com/pymc-devs/pymc/pull/7115)

juanitorduz commented 8 months ago

In view of https://github.com/pymc-devs/pymc-examples/issues/626#issuecomment-1911777083 I think the very first iteration is ready for review in order to collect feedback.

juanitorduz commented 8 months ago

The pre-commit-ci jobs fails with a strange error. Still the pre-commit on the GitHub actions passes :)

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:20Z ----------------------------------------------------------------

Given the update here I think the intro is out of date now. I would also reconsider some of the description so that it aligns with the intended audience. Would add more description of what the HSGP approximation is and when to use it. I'm not sure that "The main idea of this method relies on the Laplacian's spectral decomposition to approximate kernels' spectral measures as a function of basis functions." is going to mean much to people who aren't already very familiar with how HSGPs work.

Could also clean it up a bit, feels rough-drafty to me. Like, Vehtari's case study is described and linked to twice, was that on purpose? Should also describe that this example started in GPTools, and is now fairly easy to do in a PPL like Stan or PyMC now because of the HSGP approximation, which is pretty cool.

AlexAndorra commented on 2024-02-05T19:19:54Z ----------------------------------------------------------------

Agree with Bill

juanitorduz commented on 2024-02-13T13:11:34Z ----------------------------------------------------------------

Absolutely, I willl tackle this at the end after we have a better picture of the scope of this first iteration.

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:21Z ----------------------------------------------------------------

> the number of births relative to 100

Why scale the data this way?

juanitorduz commented on 2024-02-13T13:10:55Z ----------------------------------------------------------------

I am not 100%, maybe to have smaller numbers (maybe easy to think about priors?)

AlexAndorra commented on 2024-02-21T11:52:53Z ----------------------------------------------------------------

We could try and see how modeling on the raw outcome scale works? For now I'm not convinced this scaling makes things easier, neither for writing down the model, nor for sampling

juanitorduz commented on 2024-02-26T21:34:43Z ----------------------------------------------------------------

To keep the scope of the notebook: Reproduce Model 3 from Aki's blog (see discussion below regarding modeling choices), I suggest we keep the scaling ;)

AlexAndorra commented on 2024-03-11T02:01:07Z ----------------------------------------------------------------

Sounds good. Then maybe mention that it's done to be as close as possible to Aki's case study, so that people understand they don't have to scaled the data this way for HSGP to work

juanitorduz commented on 2024-03-11T14:01:09Z ----------------------------------------------------------------

ok! 👍

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:22Z ----------------------------------------------------------------

bit of a nit-pick on the wording but, not quite a "deep dive"? Maybe say

"We see a clear long term trend component and a clear yearly seasonality. Let's plot the day of year on the x-axis vs. the number of births on the y-axis to see this pattern more clearly."

juanitorduz commented on 2024-02-13T13:12:31Z ----------------------------------------------------------------

yes! added!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:23Z ----------------------------------------------------------------

Maybe add what you see in the plot above and then why split by month and year?

juanitorduz commented on 2024-02-13T13:17:19Z ----------------------------------------------------------------

Yes! Added!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:24Z ----------------------------------------------------------------

I guess it's clear to me (at this point), that you're systematically checking seasonality at year, month, day of week levels. Maybe good to say so earlier?

juanitorduz commented on 2024-02-13T13:23:02Z ----------------------------------------------------------------

Good point!

bwengals commented on 2024-03-27T21:41:31Z ----------------------------------------------------------------

maybe missed it earlier, but I do think it'd be nice to say something like "Next we're going to systematically look at seasonality, first yearly, then monthly" ... etc. Just wanted to bring it up again in case it got lost in the shuffle -- also totally ok if you think it messes your flow

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:25Z ----------------------------------------------------------------

Would add more detail, maybe uncertainty estimates, +- stdev?

I guess I'm not following why we'd check if this pattern changes over the years? Not sure what the clue would be.

Might be nice to give a guess as to why this is, maybe not here at this specific spot in the nb, but I find it quite weird that the number of births isn't just uniform over every day of the year. I think there's info about why somewhere?

AlexAndorra commented on 2024-02-21T11:56:01Z ----------------------------------------------------------------

I think for the weekend it's easy to imagine: less staff, some births are postponed to the beginning of the following week. This dataset is quite old, but I'm guessing this pattern is even more apparent in recent years, with the increasing technological ability to trigger labor

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:25Z ----------------------------------------------------------------

Maybe? Might add some explanation for what the difference you're seeing is exactly? Is 6 and 7 Sat and Sun? Could label with actual days instead of numbers. I like how the weekends are lighter colors and weekdays are darker.

It looks to me like the weekdays are increasing but weekdays are staying flat?

AlexAndorra commented on 2024-02-05T19:32:15Z ----------------------------------------------------------------

In addition to what Bill said, it's interesting (and weird) that the trends look similar from 1969 to 1977, then the weekdays start trending up while the weekends grow way slower

juanitorduz commented on 2024-02-13T13:27:46Z ----------------------------------------------------------------

Indeed! This is a key observation from Aki's blog:

Looking at the time series of whole data we see the dots representing the daily values forming three branches that are getting further away from each other. In previous analysis (BDA3) we also had a model component allowing gradually changing effect for day of week and did observe that the effect of Saturday and Sunday did get stronger in time

which motivates the amplitude GP. As we wont add this component for now... shall I remove this plot?

AlexAndorra commented on 2024-02-21T11:57:54Z ----------------------------------------------------------------

I think so, otherwise the EDA is overwhelming. We can add back this plot back when / if we add that component -- that'd be useful context

juanitorduz commented on 2024-02-26T20:30:19Z ----------------------------------------------------------------

ok! I will remove it!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:26Z ----------------------------------------------------------------

Might be nice to summarize what specifically we learned about the data and patterns

AlexAndorra commented on 2024-02-05T19:27:32Z ----------------------------------------------------------------

~~patters~~ patterns

AlexAndorra commented on 2024-03-11T02:05:44Z ----------------------------------------------------------------

Typo is still there

juanitorduz commented on 2024-03-11T13:56:35Z ----------------------------------------------------------------

😄. ok, not it is fixed!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:27Z ----------------------------------------------------------------

why?

juanitorduz commented on 2024-02-13T13:44:20Z ----------------------------------------------------------------

Added a comment:

"We want to work on the normalized log scale of the relative births. The reason for this is to work on a scale where is easier to set up priors (scaled space) and so that the heteroscedasticity is reduced (log transform)."

juanitorduz commented on 2024-02-13T13:44:37Z ----------------------------------------------------------------

This is also done by Aki.

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:28Z ----------------------------------------------------------------

Line #8.    )

add semicolons at end of the last line to not get the cruft printed

Text(0.5, 1.0, 'Relative Births in the USA in 1969 - 1988\nTransformed Data')

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:29Z ----------------------------------------------------------------

All of these building blocks should not come as a surprise after looking into the EDA section.

Would it be better to interleave the EDA and model building then? I think it's better, especially for beginners, to not say things are obvious or not a surprise, because it doesn't help understanding and only potentially makes them feel dumb. To be honest after looking at the plots above, I dunno if I'd make the exact same model (unless I was copying the Vehtari case study).

Also, then the effect from the plot "Relative Births in the USA in 1969 - 1988\nMean over Day of Week and Year"

isn't included in the model?

We use a normal distribution on the day of the week one-hot-encoded values. As the data is standardized, in particular centered around zero, we do not need to add an intercept term. In addition, we set the coefficient of Monday to zero to avoid identifiability issues.

Dummy encoding. Also the fact that the data is standardized isn't why we're not adding an intercept, it's because the data is standardized AND the GP terms we are going to add can float and have means that aren't exactly zero.

Because of how day of week is encoded, the "intercept" is the same as the Monday effect (with other effects factored out). This is one reason you always should have an intercept in your model, and standardizing doesn't automatically take it out, because intercept != mean of data.

_AlexAndorra commented on 2024-02-05T19:38:06Z_ ----------------------------------------------------------------

Yeah, I also prefer having an intercept, and then the ZSN parametrization. Otherwise you have to do pivoting anyways, which often is trickier to implement and interpret.

Also, why not use another Gaussian process with a periodic kernel here?

_juanitorduz commented on 2024-02-13T13:51:30Z_ ----------------------------------------------------------------

Thanks for the feedback! Here are some remarks:

Build the model iteratively: Ideally yes, but then this notebook would be super long and we would be replikating Aki's complete blog post. My suggestion (intention) is to make the EDA clear enough so thatstarting with this baseline model (Model 3 from Aki's blog) is not unreasonable even for a beginner. What do you think?
Dummy encoding + intercept: You are right! this is a comment which got mixed with the initial ZSN approach where I intentionally removed an intercept.
Regarding the ZSN parametrization: I also like it better but might be harder to interpret for new users. I suggest we keep the one-hot encoding to make it similar to Aki's approach and to keep the new topics (for new users) bounded. We can add the ZSN approach as a comment. Also, note we do not have even a beginners example to ZSM. We should add this into the to-to list ;)

_AlexAndorra commented on 2024-02-21T16:17:03Z_ ----------------------------------------------------------------

make the EDA clear enough so thatstarting with this baseline model (Model 3 from Aki's blog) is not unreasonable even for a beginner.

SGTM

We can add the ZSN approach as a comment.

Yes, I think we can show the equivalent parametrization with ZSN in a markdown cell. It's a good way for people to start getting familiar with ZSN.

Also, note we do not have even a beginners example to ZSM. We should add this into the to-to list ;)

I know I know, I've been meaning to do that for months now, but have been able to set the time aside yet -- shame on me ;)

On the workflow more generally, what do you think of adding the corresponding modeling component after each corresponding EDA plot? That way, it's really not a surprise to readers at that point, and this part here is just a summary of what we already saw

_bwengals commented on 2024-02-21T22:44:17Z_ ----------------------------------------------------------------

I suggest we keep the one-hot encoding to make it similar to Aki's approach and to keep the new topics (for new users) bounded.

It's funny (or possibly frustrating for Juan), so I had talked to @juanitorduz about this offline and I guess I convinced him to use the same dummy encoding Vehtari did in the blog post instead of ZSN, and now @AlexAndorra you're trying to persuade Juan back to his original position. Sorry about that!

I think overall, you could make a case for all the options: dummy encoding (fixed effect), hierarchical normal (random effect), or hierarchical normal with zero sum constraint -- depending on how much partial pooling you think is relevant. I suspect Vehtari used the dummy encoding over hierarchical normal because of issues with GPs and identifiability with intercepts (haven't tried it and checked though, could be wrong).

I think the bigger question is how close do you want to stick with Vehtari's version? I think if we deviate it should be purposeful or an improvement

Then, second question, is I dont think ZSN is just a drop-in replacement for a dummy encoded fixed effect, because of the partial pooling that's happening with ZSN and not for fixed effects (forgive me if I'm assuming a bit too much here and no one's implying that!). I do think day of the week is nice use case for the constraint though, because you're never going to predict for some new 8th day of the week. I dunno if this point is too tricky/subtle to slip in here, or if it'd be better in a ZSN focused example.

_AlexAndorra commented on 2024-02-22T22:45:10Z_ ----------------------------------------------------------------

Thanks Bill! Lots of interesting threads here. To make sure we're on the same page, we're talking about these 2 different parametrizations, right?

Dummy encoding

# day of week
b_day_of_week_no_monday = pm.Normal(
    name="b_day_of_week_no_monday", sigma=1, dims="day_of_week_no_monday"
)
b_day_of_week = pt.concatenate(([0], b_day_of_week_no_monday))

# then used in the linear predictor as:
b_day_of_week[day_of_week_idx_data] * (day_of_week_idx_data > 0)

ZSN (taken from Juan's blogpost)

log_f_day_of_week = gp_day_of_week.prior(
    name="log_f_day_of_week", X=normalized_obs_data[:, None], dims="obs"
) # why is it log BTW?
f_day_of_week = pm.Deterministic(
    name="f_day_of_week", var=pt.exp(log_f_day_of_week), dims="obs"
)

b_day_of_week = pm.ZeroSumNormal(name="b_day_of_week", sigma=1, dims="day_of_week")

# then used in the linear predictor as:
f_day_of_week * b_day_of_week[day_of_week_idx_data]

First, there are 2 things I don't get here: 1) Why does this one has a GP and the previous one doesn't? 2) Why is the GP exponentiated? If somebody knows, any help is appreciated :)

That put aside, these parametrizations are indeed equivalent: the first one is using pivoting (fixing one category to 0), while the second one is using ZSN (fixing the sum of the categories to 0).

It changes the interpretation and the code, but both params are used to prevent over-parametrization of the model, because we have an intercept (well, just a GP for the global trend here, but that's the same idea).

So, to answer your second question Bill: there is no partial pooling with ZSN here; it's just removing one degree of freedom in a different way than pivoting (aka dummy encoding).

I do agree with you that day of the week is a nice use case for ZSN, because you're never going to predict for some new 8th day of the week, but I do think this point is more appropriate for a ZSN-focused example.

All this makes me wonder though: why don't we one-hot encode day of week and pass that to the GP? That would mean input_dim = X.shape[1] = 1 + 7 = 8. Would that work? Would it be equivalent to the previous parametrization?

_bwengals commented on 2024-02-23T21:55:34Z_ ----------------------------------------------------------------

First, there are 2 things I don't get here: 1) Why does this one has a GP and the previous one doesn't? 2) Why is the GP exponentiated? If somebody knows, any help is appreciated :)

I think it's because in the original case study here https://avehtari.github.io/casestudies/Birthdays/birthdays.html, Model 3 doesnt include a time component on day of week (day of week effects are treated as constant over the years), Model 4 introduces a time component based on the hypothesis that they day of week effect grows more pronounced as time goes on.

these parametrizations are indeed equivalent

OK... I had to think about this a while, I think I've been super confused about ZSN on this point! If you pass a prior for sigma for ZSN, it represents a random effect with a zero sum constraint and with partial pooling. If you don't, and pass a specific value for sigma, it represents a fixed effect with a zero sum constraint and no partial pooling, like you said. Still have more ZSN Q's... but it's off topic.

All this makes me wonder though: why don't we one-hot encode day of week and pass that to the GP? That would mean input_dim = X.shape[1] = 1 + 7 = 8. Would that work? Would it be equivalent to the previous parametrization?

I think you could. Is that extra dimension, the 1 in 1 + 7 = 8, is that time? That would give you the interaction of day of week and time. The size of the data might constrain you here, and also the hypothesis in described in Vehtari's model 4 is a bit more structured. But, yah I think it'd be reasonable thing to do.

_AlexAndorra commented on 2024-02-24T00:52:29Z_ ----------------------------------------------------------------

Model 3 doesnt include a time component on day of week (day of week effects are treated as constant over the years), Model 4 introduces a time component based on the hypothesis that they day of week effect grows more pronounced as time goes on.

Ah ok, makes sense, thanks Bill!

If you pass a prior for sigma for ZSN, it represents a random effect with a zero sum constraint and with partial pooling. If you don't, and pass a specific value for sigma, it represents a fixed effect with a zero sum constraint and no partial pooling, like you said. Still have more ZSN Q's...

Yes, exactly! Feel free to send them my way in DM ;)

Is that extra dimension, the 1 in 1 + 7 = 8, is that time?

Yes

That would give you the interaction of day of week and time. The size of the data might constrain you here [...] But, yah I think it'd be reasonable thing to do.

Yeah and HSGP would probably break with 8 input dimensions. But good to know it'd be reasonable!

In general, do you recommend more this "big GP" parametrization, or the ZSN + GP parametrization from Juan's blogpost?

_juanitorduz commented on 2024-02-26T20:41:27Z_ ----------------------------------------------------------------

This is such en enlighting threat! Thanks! In particular, Alex

If you pass a prior for sigma for ZSN, it represents a random effect with a zero sum constraint and with partial pooling. If you don't, and pass a specific value for sigma, it represents a fixed effect with a zero sum constraint and no partial pooling, like you said.

I did not know this! This deserves a proper explanation in your upcoming ZSN notebook ;)

Regarding the scope of this example for this iteration. My suggestion is let's keep it simple because we do now we want to iterate over this example as described in the corresponding issue. I won't close this thread so that we have it for reference. I suggest keeping the one-hot encoding to reproduce Aki's results of model 3 (therefore I will kipp the ./100 scaling). I will add a comment on the "equivalent" ZSN parametrization as a FYI and link to my post if that's ok with you guys :)

_bwengals commented on 2024-03-27T21:45:34Z_ ----------------------------------------------------------------

Wanna re-mention this one:

All of these building blocks should not come as a surprise after looking into the EDA section.

I think it's better, especially for beginners, to not say things are obvious or not a surprise, because it doesn't help understanding and only potentially makes them feel dumb. To be honest after doing the EDA above, I dunno if I'd make the exact same model (unless I was copying the Vehtari case study).

_juanitorduz commented on 2024-03-29T08:33:58Z_ ----------------------------------------------------------------

You are 100% right! Let me rephrase it 🙏

_juanitorduz commented on 2024-03-29T08:51:05Z_ ----------------------------------------------------------------

Let me know what you think about the change :)

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:30Z ----------------------------------------------------------------

Maybe for the plot, transform it to show the prior on the scale of days / years? (I at least can't visualize log/exp things in my head!)

AlexAndorra commented on 2024-02-05T19:51:42Z ----------------------------------------------------------------

Yeah I think the plot on the data scale would be more telling, and then you can show this one, to demonstrate how the transformation changes things around.

Also, aren't you taking the log because you're getting back to the the normalized log scale?

Because the LogNormal is doing the log transformation of the entered values under the hood anyways

AlexAndorra commented on 2024-03-11T02:15:57Z ----------------------------------------------------------------

Typo: "~~whe~~ we want to consider"
And I think the log in the mu of the LogNormal is because you're setting the prior on the normalized log scale, because that's what the GP is seeing, not because of the LogNormal

juanitorduz commented on 2024-03-11T13:54:45Z ----------------------------------------------------------------

👍 👌

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:31Z ----------------------------------------------------------------

Line #78.    pm.model_to_graphviz(model=model)

This isn't appearing. Either include / describe it or remove it?

_AlexAndorra commented on 2024-02-05T20:22:31Z_ ----------------------------------------------------------------

I think it's just a ReviewNB limitation

_juanitorduz commented on 2024-02-13T13:53:18Z_ ----------------------------------------------------------------

Yes, It does appear on the docs https://pymcio--627.org.readthedocs.build/projects/examples/en/627/gaussian_processes/GP-Births.html

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:32Z ----------------------------------------------------------------

an extra intercept in the model and this can hurt sampling.

model doesn't have an intercept though? Should tie into explanation about that. I think for beginners this wouldn't be clear.

_AlexAndorra commented on 2024-02-21T16:35:57Z_ ----------------------------------------------------------------

I didn't know about that! Really good to know.

You're not using drop_first above though

_juanitorduz commented on 2024-02-26T21:08:18Z_ ----------------------------------------------------------------

Agree! I hope its better now!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:33Z ----------------------------------------------------------------

suppress warnings?

juanitorduz commented on 2024-02-13T13:57:24Z ----------------------------------------------------------------

Done!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:33Z ----------------------------------------------------------------

I agree! But maybe a sentence why?

juanitorduz commented on 2024-02-13T13:58:24Z ----------------------------------------------------------------

added a clarification

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:34Z ----------------------------------------------------------------

lots of warnings

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:35Z ----------------------------------------------------------------

Might be nice to mention we get the same results as Stan from Vehtari's case study? I think this is his model 3

juanitorduz commented on 2024-02-13T14:01:56Z ----------------------------------------------------------------

Added!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:36Z ----------------------------------------------------------------

we want to do a deep dive into

juanitorduz commented on 2024-02-13T14:03:22Z ----------------------------------------------------------------

Thanks! added!

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:37Z ----------------------------------------------------------------

maybe more comments on what this code is doing? Would it be simpler to not use sklearn pipelines to just mean subtract, divide by stdev, then log? (I'm looking at the expand_dims, code_dims, squeeze, shape stuff happening)

juanitorduz commented on 2024-02-13T14:05:05Z ----------------------------------------------------------------

You are right about this looking a bit weird. Once I got it it became my default as I love sklearn transformers. I will try to simplify ...

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:38Z ----------------------------------------------------------------

Could argue it looks quite bad! Pretty big discrepancy between black line and shaded blue in the bulk of posterior, tails look good. This suggests we might be missing some covariates

juanitorduz commented on 2024-02-13T14:07:48Z ----------------------------------------------------------------

You are right! I changed it. (somehow I wanted to remain optimistic). I added :

This does not seem very good as there is a pretty big discrepancy between black line and shaded blue in the bulk of posterior, tails look good. This suggests we might be missing some covariates. We explore this in a latter more complex model.

Which should be a nice entry point for the next iteration where you work out the complete model.

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-01-29T19:29:39Z ----------------------------------------------------------------

At this point its all you! I can add a bullet for me when i add something, appreciate it though!

juanitorduz commented on 2024-02-13T14:10:33Z ----------------------------------------------------------------

mmm ok! I still believe this example has benefit already quite a lot from our discussions and input.

bwengals commented 8 months ago

Left a bunch of comments, pls keep in mind that I didnt think about what should be changed now vs. another iteration, so they're all pretty optional.

One thought overall, is it doesn't really follow the Bayesian workflow. It starts 1. EDA, 2. Model, 3. Results. But to me the step from the EDA straight to (a pretty complicated) model wouldn't have followed. Would it be possible to interleave EDA, building the model in steps, looking at results? Right now it feels like we're just copying the Vehtari case study model 3 but at the same time pretending we arrived at it naturally. Another option would be to just explicitly say we are copying model 3 from Vehtari, and then not do all the EDA (or at least make it clear that the plots are complementary to the model code)

theorashid commented 8 months ago

I would vote for copying straight from Vehtari à la numpyro example, skipping the EDA. My attention span isn't long enough for a really long example – I'm just here to copy and paste the HSGP code. Cite the original blog post if they want the workflow

AlexAndorra commented 8 months ago

Oooh fantastic, thanks Juan, I'll review that ASAP -- probably this weekend, if you can wait.

Strongly disagree with just copy pasting the code, especially for such involved models, where EDA and contextualization help make sense of the tradeoffs and modeling choices. Not to mention that having to switch from one post to another to get essential information disturbs flow state and decreases attention span even more

El El mar, 30 ene 2024 a la(s) 11:21, theorashid @.***> escribió:

I would vote for copying straight from Vehtari à la numpyro example, skipping the EDA. My attention span isn't long enough for a really long example – I'm just here to copy and paste the HSGP code. Cite the original blog post if they want the workflow

— Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc-examples/pull/627#issuecomment-1916974594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIJMTAMBFYOIAHORE2SIR3YRD6XTAVCNFSM6AAAAABB47QFPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWHE3TINJZGQ . You are receiving this because you commented.Message ID: @.***>

AlexAndorra commented 7 months ago

Agree with Bill

View entire conversation on ReviewNB

AlexAndorra commented 7 months ago

~~patters~~ patterns

View entire conversation on ReviewNB

AlexAndorra commented 7 months ago

In addition to what Bill said, it's interesting (and weird) that the trends look similar from 1969 to 1977, then the weekdays start trending up while the weekends grow way slower

View entire conversation on ReviewNB

AlexAndorra commented 7 months ago

Yeah, I also prefer having an intercept, and then the ZSN parametrization. Otherwise you have to do pivoting anyways, which often is trickier to implement and interpret.

Also, why not use another Gaussian process with a periodic kernel here?

View entire conversation on ReviewNB

AlexAndorra commented 7 months ago

Yeah I think the plot on the data scale would be more telling, and then you can show this one, to demonstrate how the transformation changes things around.

Also, aren't you taking the log because you're getting back to the the normalized log scale?

Because the LogNormal is doing the log transformation of the entered values under the hood anyways

View entire conversation on ReviewNB

AlexAndorra commented 7 months ago

I think it's just a ReviewNB limitation

View entire conversation on ReviewNB

review-notebook-app[bot] commented 7 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-02-05T20:45:48Z ----------------------------------------------------------------

We use a the following priors

juanitorduz commented on 2024-02-13T13:52:02Z ----------------------------------------------------------------

Thanks!

review-notebook-app[bot] commented 7 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-02-05T20:45:49Z ----------------------------------------------------------------

Line #11.        normalized_obs_data = pm.Data(

I would rename that normalized_time, as "obs" can refer to outcome or covariates, so more ambiguous

_AlexAndorra commented on 2024-02-21T16:32:16Z_ ----------------------------------------------------------------

See our private conversation on that point

_juanitorduz commented on 2024-02-26T20:45:31Z_ ----------------------------------------------------------------

Will do!

review-notebook-app[bot] commented 7 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-02-05T20:45:50Z ----------------------------------------------------------------

Line #47.        b_day_of_week_no_monday = pm.Normal(

Really not a fan of this parametrization in comparison to ZSN, especially since you have to deal with it again in the linear predictor

_juanitorduz commented on 2024-02-13T13:54:17Z_ ----------------------------------------------------------------

I agree, but see coment above about keeping the concepts accessible as we do not want to confuse users. I am open for voting haha

_juanitorduz commented on 2024-02-13T14:02:41Z_ ----------------------------------------------------------------

An argument for the one-hot encoding is because we can replicate Aki's results (see comment just after the trace).

_AlexAndorra commented on 2024-02-21T16:31:57Z_ ----------------------------------------------------------------

Yep, replied above, I think we have a good solution now

review-notebook-app[bot] commented 7 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-02-05T20:45:51Z ----------------------------------------------------------------

we ~~can not~~ cannot simply sum the to components

juanitorduz commented on 2024-02-13T14:08:46Z ----------------------------------------------------------------

solved! thanks!

review-notebook-app[bot] commented 7 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-02-05T20:45:52Z ----------------------------------------------------------------

You might wanna turn this into a function, as you're using this plot a lot, and I'm guessing you will again in the more complicated model (this is an awesome plot BTW)

juanitorduz commented on 2024-02-13T14:09:29Z ----------------------------------------------------------------

yes! based on the comments above a function will help so that people don't get lost in plotting code.

juanitorduz commented 7 months ago

Thank you all for the detailed review! I will address them at the end of the week 💪 . Much appreciated :)

juanitorduz commented 7 months ago

I am not 100%, maybe to have smaller numbers (maybe easy to think about priors?)

View entire conversation on ReviewNB

juanitorduz commented 7 months ago

Absolutely, I willl tackle this at the end after we have a better picture of the scope of this first iteration.

View entire conversation on ReviewNB

juanitorduz commented 7 months ago

yes! added!

View entire conversation on ReviewNB

juanitorduz commented 7 months ago

Yes! Added!

View entire conversation on ReviewNB

pymc-devs / pymc-examples

HSGP: Birthdays Example #627