seismo-live / seismo_live

Live Jupyter Notebooks for Seismology
http://seismo-live.org
78 stars 73 forks source link

Make contributing easier? #13

Open megies opened 6 years ago

megies commented 6 years ago

Currently, I think it's quite hard for people to contribute. The main reason, in my opinion, is that the jupyter notebooks stored in the repository include output of cells and also images etc. The problem with this is that when users make local changes and then commit changes they're presented with huge diffs and when git-pulling in upstream changes they're bound to be presented with huge and insane git merge conflicts, that will pose a huge barrier for non-expert git people.

Right now, I see two possible solutions:


  1. Store the notebooks in a different format and only convert them to notebooks and run them (for the actual website content with cell outputs / images) in CI

One option that I see is notedown, which has a markdown format that converts pretty nicely to/from ipynb format.

Big drawback is that contributors will likely be working in jupyter and then would have to convert to markdown themselves and drop those changed markdown files into the repo and commit them. So, I guess this introduces just different hurdles in the contribution process.


  1. Store the notebooks stripped of all output in the repo, and only execute the notebooks during CI and commit the notebooks including the output in a different location (either in a separate folder in the repo -- or ideally only in the gh-pages branch)

This has the advantage that users can still directly work in the repo and using jupyter, but will get confronted with only minimal diffs of their actual changes.

This could be done using nbstripout which can be installed via anaconda and has a one-liner setup call.


Comments?

megies commented 6 years ago

This might also be worth a look: https://github.com/rossant/ipymd

krischer commented 6 years ago

I agree that it is currently much too hard to contribute and I'm note sure how to really improve this. I'd personally would really like to keep the rendered output in the repository for two reasons:

  1. They can be viewed (with results) in github and the notebook viewer web app.
  2. Testing is possible and short-term we do plan to add testing with nbval (https://github.com/computationalmodelling/nbval) - this is really needed to be able to maintain seismo-live - it is already too much work to manually check every notebook.

The only "solution" I see is two-fold:

Long term I hope that github provides some kind of nicer interface to edit notebooks.

I'd also be very happy to hear thoughts and opinions of other people!