stan-dev / stan-dev.github.io

Stan website based on the So Simple Jekyll theme.
https://mc-stan.org/
19 stars 39 forks source link

Update case studies to use new language syntax #189

Open avehtari opened 1 year ago

avehtari commented 1 year ago

With Stan 2.33+ several old language syntax features produce errors. All the case studies would be good to update to use the latest syntax. Many case studies are in external repos and the authors have submitted only the rendered html and short md-part for the case study contents page. Only the html needs to be updated in users/documentation/case-studies/.

It would be good o contact the original authors and ask them if they are willing to update their repos and submit a new html. If the authors disagree or don't respond, we may consider updating just the syntax on html.

To start the process, I'm listing here all the case studies, and we can start tracking which have been fixed. Tagging also some authors that were easily found by github id autocomplete @mitzimorris, @WardBrian, @bob-carpenter, @charlesm93, @bbbales2, @imadmali


avehtari commented 1 year ago

Tagging more authors @betanalpha, @danielcfurr, @hyunjimoon, @education-stan, @Cristinabarber, @joonho112, @LuZhangstat, @kaybenleroll, @milkha, @mbjoseph, @fonnesbeck

mitzimorris commented 1 year ago

in the interim, we could insert a paragraph at the top of the old case studies saying that the code is using the old syntax and instructing the reader to run the stanc canonicalizer on the code themselves.

exercises to the reader are less work than exercises to the author.

hyunjimoon commented 1 year ago

Just an idea, but it would be handy if chatgpt can auto-translate old casestudies with old syntax (e.g. python 2.7) to new syntax (python 3.10)? Python https://docs.python.org/3/library/2to3.html seems to hand-coded this translation.

mitzimorris commented 1 year ago

we don't need chatGPT.

please get the latest release of Stan, and then do (something like this)

> /path/to/cmdstan/bin/stanc --print-canonical my_file.stan > new.tmp
> diff -y -W 180 my_file.stan new.tmp
> mv new.tmp my_file.stan

that diff command will show files side-by-side - it's an easy way to check that stanc did the right thing and only the right thing.

update: for some reason the above procedure is adding an extra newline to files. @WardBrian does the canonicalizer always add a newline proactively to its output in case the input was missing one?

jgabry commented 1 year ago

in the interim, we could insert a paragraph at the top of the old case studies saying that the code is using the old syntax and instructing the reader to run the stanc canonicalizer on the code themselves.

Yeah this sounds like a good idea until these are updated.

exercises to the reader are less work than exercises to the author.

Exercises to the author require doing once and all readers benefit. Exercises to the reader require doing N_readers times. So the latter requires a lot more work overall, just less work for the author. Or am I misunderstanding what you meant?

that diff command will show files side-by-side - it's an easy way to check that stanc did the right thing and only the right thing.

Nice!

WardBrian commented 1 year ago

I manually went through the ones which were unclear and figured out if they needed updating or not. That brings the total up to 11/42 being good to go - either because they used the new syntax, didn't use any of the old syntax, or (in a few cases) contained no actual stan code in the text of the case study.

It's also worth noting that any case study which stored it's code in the example-models repo had its code automatically updated a while back. If any of those case studies are using something like writeLines(readLines("model.stan")), then the only work that actually needs to be done is just re-kniting. More than a few seem to store the code in a string or text block in the markdown, however.

bob-carpenter commented 1 year ago

@hyunjimoon : It's going from the old Stan syntax to the new Stan syntax. ChatGPT(4) is pretty good at Python, but it's very bad at Stan.

bob-carpenter commented 1 year ago

If we keep our User's Guide, Reference Manual, and Functions Reference up to date, I don't think breaking the old case studies should block any of our updates. Specifically, I'm OK putting a warning up and then fixing them as we can. Another alternative is moving the ones that aren't updated to a "deprecated case study" location and flagging them up front.

I can update the five of my case studies that weren't built with the new Stan syntax:

jgabry commented 1 year ago

If we keep our User's Guide, Reference Manual, and Functions Reference up to date, I don't think breaking the old case studies should block any of our updates. Specifically, I'm OK putting a warning up and then fixing them as we can. Another alternative is moving the ones that aren't updated to a "deprecated case study" location and flagging them up front.

I agree that we shouldn't hold up Stan releases just because they break case studies. A warning about it would be good. Right now the website says:

The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming

which is a bit unfortunate since best practices would include code that doesn't error.

What if we change the note at the top to say this?

The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming. We aim to keep them current with the latest version of the Stan language, but there may be times when case studies need updating to reflect the latest Stan features and syntax.

That could probably be worded better, but something along those lines?

bob-carpenter commented 1 year ago

That wording sounds good. Did we want to point people to the Stan code updater in stanc3?

jgabry commented 1 year ago

Did we want to point people to the Stan code updater in stanc3?

The only reason I'd hesitate to do that is that on slack @WardBrian mentioned that in future versions (2.34 and beyond) we won't be able to parse and fix the old code anymore. But maybe that's not a reason to avoid mentioning it. Once we get to future breaking changes it will be those changes that need fixing not the array syntax anymore, so I guess the auto-formatter/canonicalizer will at that point work just fine for whatever syntax needs changing at that point.

jgabry commented 1 year ago

I opened PR https://github.com/stan-dev/stan-dev.github.io/pull/191 to add the disclaimer at the top of the case studies page. I didn't mention the auto-formatter/canonicalizer but I can update it to mention it if we want that. (It is accessed differently in the different interfaces, so we'd have to decide whether to just mention it exists or actually demo how to use it in the different interfaces.)

jgabry commented 1 year ago

Is the process for updating the ones in example-models repo the following?

(I just did this for the HMM interface example case study, but I can update my PRs if this process isn't right)

WardBrian commented 1 year ago

Yep, sounds right to me. I have just updated the new ODE and golf case studies like this