the-turing-way / the-turing-way

Host repository for The Turing Way: a how to guide for reproducible data science
http://the-turing-way.org/
Other
1.92k stars 654 forks source link

Jupyter Community Workshops: Proposal to host event Jan-Aug 2020 | Deadline 15 Dec #770

Closed KirstieJane closed 4 years ago

KirstieJane commented 4 years ago

Summary

It would be great to bring folks together at the Turing to discuss how we can build capacity for contributing to Jupyter in the UK.

This isn't strictly a Turing Way initiative, but its close enough that I'm happy making the issue here. It's more a Turing Institute application though, potentially through the Tools, Practices and Systems research theme.

I propose that we hold the event at the Turing Institute in London. It's pretty easy to get to and we wouldn't have to pay for the space. Fine to adjust if others would prefer to host.

Useful links:

What needs to be done?

Who can help?


Updates

No updates so far.

IanHawke commented 4 years ago

Ian Hawke, University of Southampton Applied Maths. Teaching (live coding and prepping teaching materials) with notebooks since 2014. Would like to contribute more on producing teaching materials from notebooks, particularly from accessibility angles.

psychemedia commented 4 years ago

Hi - would love to be involved with this:

One of the questions I'd like to discuss is how we can promote adoption of notebooks in UK HEI and how we can make cases for support / mount advocacy campaigns to run JupyterHub and BinderHub services in our institutions.

KirstieJane commented 4 years ago

Love this idea @psychemedia! Could be good to have a few librarians in the meeting too so we can think about where these responsibilities might lie in the ecosystem.

@pherterich @rosiehigman @jezcope - you or any suggestions?

psychemedia commented 4 years ago

@KirstieJane re: librarians.. yes.. I'll try and get at least one of ours along... Have you tapped the BL too?

pherterich commented 4 years ago

@KirstieJane I could feed back/invite people to the RDA computational notebook BoF - but then @hughshanahan is closer. Not sure how much else I can provide though as I'm not a librarian in a HEI any longer. I pinged the Edinburgh notebook team to ask if they'd be interested (they commented on Twitter a while back).

colombod commented 4 years ago

Diego Colombo, Microsoft. Have not contributed yet to the project. I am interested in boosting Jupyter experience better code editor support for notebook users and additional languages for notebooks

tonyyzy commented 4 years ago

Tony Yang, Imperial College, uses Jupyter daily for data analysis. Interested in reproducible Jupyter notebook, want to have a cell history tracking plugin that records commands in the order of execution.

perllaghu commented 4 years ago

Ian Stuart, EDINA, we use Jupyter Notebooks / Jupyterhub in a Kubernetes Cluster... and provide that as a service to [UK] Education. We have a whole slew of interests: Accessibility; Scalability; nbgrader; creating our own notebooks; ... We hosted the Nbgrader Hackathon/Code Sprint back in May 2019

We are contributing code to nbgrader, would like to be involved in the accessibility work for notebooks, and can probably give a talk or two at the event.

BertR commented 4 years ago

Bert Robberechts, EDINA (part of The University of Edinburgh), I'm on the same team as @perllaghu :arrow_up: working on the JupyterHub environment for higher education, which we run out of Edinburgh. I have contributed code to nbgrader and try to help out people with Kubernetes related questions in the JupyterHub community.

psychemedia commented 4 years ago

@tonyyzy re:

want to make a cell history tracking plugin that record commands in the order of execution.

By chance, and not sure if this is useful, for cell history, I did a recent round up here. Re: cell execution order, and lined up for next TJ, there's the manual execution_dependencies nbextension, nodebook magic (which the polynote flow execution model resembles, I think...?) and this notebook on tracking inconsistencies in notebooks that I haven't yet got my head round yet...

tonyyzy commented 4 years ago

Thank you so much @psychemedia! I remember couldn't find much a few months ago, those are very helpful suggestions! I will definitely look into all the options ๐Ÿ˜ƒ

ihrynaszkiewicz commented 4 years ago

Hi folks, @iainh_z, Publisher, Open Research @PLOS, based in Cambridge, UK. In the context of our code and data sharing policies, and goals to support and advance open research, we're interested to be part of conversations with the Jupyter community. Particularly interested in exploring how scholarly communication can better support sharing, review, discovery and reuse of research outputs created with these and similar tools.

colombod commented 4 years ago

@tonyyzy re:

want to make a cell history tracking plugin that record commands in the order of execution.

By chance, and not sure if this is useful, for cell history, I did a recent round up here. Re: cell execution order, and lined up for next TJ, there's the manual execution_dependencies nbextension, nodebook magic (which the polynote flow execution model resembles, I think...?) and this notebook on tracking inconsistencies in notebooks that I haven't yet got my head round yet...

@psychemedia very interesting area.

sgibson91 commented 4 years ago

@tonyyzy re:

want to make a cell history tracking plugin that record commands in the order of execution.

By chance, and not sure if this is useful, for cell history, I did a recent round up here. Re: cell execution order, and lined up for next TJ, there's the manual execution_dependencies nbextension, nodebook magic (which the polynote flow execution model resembles, I think...?) and this notebook on tracking inconsistencies in notebooks that I haven't yet got my head round yet...

@psychemedia very interesting area.

Just wanted to point out the Wrattler project (https://github.com/wrattler/wrattler-binder) which is a polyglot notebook with a dependency graph that is being developed as an extension to JupyterLab. It would be pretty excellent if we could spend some time getting this more integrated into the Jupyter eco-system.

sgibson91 commented 4 years ago

I'm @sgibson91, a Research Software Engineer in Research Engineering at the Alan Turing Institute. I'm also a member of The Turing Way community helping advocate for reproducible research. I am currently already contributing to the Jupyter eco-system as a member of Project Binder that operates and maintains mybinder.org, as well as contributing documentation to https://binderhub.readthedocs.io and https://zero-to-jupyterhub.readthedocs.io.

I've found being welcomed into Project Binder has provided me a great platform to reach new audiences and educate them/open discussions on the role of reproducibility and open source in research. It's exposed me to new ways of working, tools and people that will only help me grow as an RSE and has shown me that my work is valued.

However, not everything's perfect. So in the interest of a balanced opinion - I'm in quite a large sulk this week over contributions that are made without documentation, which maintains a high barrier to entry and wastes people's time as they have to put in the same amount of work to learn the pitfalls and gotchas instead of this knowledge being shared.

I guess my goals for such a workshop would be:

If I think of anything else, I'll come back to this.

KirstieJane commented 4 years ago

@sgibson91 - thank you so much for these bullet points!! They're really well aligned with the brainstorming conversation that @trallard and I just had!! :100:

KirstieJane commented 4 years ago

Comments from @martintoreilly:

Have you talked to the Wrattler team about sharing their experience integrating an alternative notebook into JupyterLab?

I was also thinking about some discussions we've had when taking to people about our safe haven. A few people would really like to be able to run a JupyterLab / Hub / Notebook in their local web browser to work with data and use compute within a safe haven. However, I'm not yet sure if this is technically achievable with strong egress controls - it might end up being a pure policy thing.

Last idea is data provenance / traceability of computations for reproducibility?

trallard commented 4 years ago

I am so here for the data provenance and computational reproducibility.

I will start adding notes/ideas to the hackmd tomorrow and try and land our brainstorming grounded @KirstieJane

hughshanahan commented 4 years ago

I'm based at the CS department at Royal Holloway. Bioinformatician by trade but moving into Open Science more or less full time now. I've organised a Birds of a Feather meeting at the RDA in October. I can liase with that group which is interested in citing notebooks/long term preservation (with @pherterich )/notebooks as FAIR objects/high performance computingwith notebooks.

martintoreilly commented 4 years ago

I am so here for the data provenance and computational reproducibility.

I will start adding notes/ideas to the hackmd tomorrow and try and land our brainstorming grounded @KirstieJane

@rolyp has been doing some work on linking graphs and plots to their underlying data and is working on our Wrattler notebook. Roly, are you interested in contributing to this workshop bid?

rolyp commented 4 years ago

@martintoreilly Yes, Iโ€™m certainly interested! The linking work is essentially data provenance for visualisations, and data provenance in notebooks is very much on-topic for me generally. Iโ€™d be interested in getting involved with building more support for this into Jupyter/Jupyter Lab.

rolyp commented 4 years ago

@KirstieJane Iโ€™m a research engineer at the Turing, currently working on Wrattler and data provenance for visualisations. I havenโ€™t contributed directly to Jupyter but am writing a grant proposal which includes some JupyterLab integration work.

In response to the 4 questions:

How does the workshop involve strategic work related to the central open-standards, protocols, abstractions and architecture, and open-source subprojects of Project Jupyter?

The workshop could include a session on the proposed JupyterLab Data Bus/Data Registry (https://github.com/jupyterlab/jupyterlab/issues/5548, https://github.com/jupyterlab/jupyterlab/issues/5733) and how it might be extended to support provenance-related metadata.

How will the workshop lead to the growth and sustainability of the Jupyter community? How will it grow the size and health of the core Jupyter project contributor community? Of the broader Jupyter ecosystem of which Jupyter is a part?

What types of people would attend the event and why does their participation align with the goals of Jupyter Workshops in general and your particular goals for this workshop?

Some industry participation would be healthy for the community and workshop goals, e.g. companies like dotscience, who are interested in data provenance and reproducibility in notebooks. I have a contact there.

โ€œHow does the event reach users/contributors that are underserved or underrepresented in our community?โ€

Iโ€™m personally interested in the use of notebooks for science communication more broadly (not just dissemination of research results), and issues surrounding data literacy. This could be a useful workshop theme that would overlap with Turing goals as well.

trallard commented 4 years ago

Ok folks!!! I have put together a proposal based on the chats here and a scoping call I had with @KirstieJane

Check it here ๐Ÿ‘‰๐Ÿผ: https://hackmd.io/VegvPEN7RT6DYu4T8G2VtQ?both

There are some items I need help with ๐Ÿ™๐Ÿผ:

โš ๏ธ Since the deadline is tomorrow I plan to submit the proposal tomorrow @ 5pm London Time โš ๏ธ

psychemedia commented 4 years ago

[Trying to identify what issues are for me... following is stream of consciousness and may be nonsense!]

In a healthy Jupyter ecosystem, there are contributors (working on the code), end users of services (folk in notebooks), and service providers (folk operating Jupyter services).

Issues 1: End user may have requirements that they would like their service providers to provider; end-users may be able to develop their own extensions, in which case how do they provide them back to their service provider and the wider community in an appropriate way (onboarding extensions contributors?) How do concerns of users who are far from the developer repos make their way to satisfied requirements/needs in terms of service development and offering? How do service providers share their customisations and learning back to the community? (This can be at policy or architectural level as well as code contributions; eg how are user auth and sign on policies managed, or user resource limits implemented; how are internal code improvements shared back (if at all; eg what internal blockers are there to sharing customisations or architecture/deployment diagrams back?).

Issue 2: folk in institutions/orgs that don't offer a service - how do these folk lobby / advocate effectively to their orgs for the adoption of a service. The Jupyterhub institutional FAQ relates to this; so do things like the JupyterHub service cost estimator.

I think pitch so-far looks at onboarding repo contributors, which can scale from folk will to fix typos (Github process does not help non-devs here...) to code contributions. But I wonder if there is scope for a softer skills discussion for "policy" folk / considerations to be aired: this might go from "how do we make a case for a Jupyter service" to "how would Jupyter fit in to our organisation and processes": eg to support teaching, research, or dissemination. The latter might also play to the concerns of publishers and research support staff - if researchers do make reproducible research artefacts available, how are these made runnable (who runs the BinderHub?) and how do institutional repositories support local archiving of these publications? (Organisational repos make local copies of papers openly available for reading, so should they also make eg notebooks openly available in a runnable sense, on their own Binderhubs? (Your notebook on your organisational repo is not open to me as an executable document if I don't have access to a server/service environment to run it on.)

I'm reminded at this point of the old Mashed Library events where techies and librarians came together to try to to make sense of each others concerns and capabilities. I wonder if such a forum would be appropriate for a Jupyter Community event? (I should have thought about this earlier and maybe tried to take a lead on such an event... Maybe a satellite session at this event, if nothing else, to scope potential for a wider such event in the next round?)

KirstieJane commented 4 years ago

Thank you everyone SO much for these contributions - and to @trallard in particular for writing the application! I'm sitting down now for a few hours to add in a few details from ideas in my head and in this thread. I'll be in the Turing Way gitter channel if anyone wants to chat in real time โœจ

KirstieJane commented 4 years ago

@psychemedia @trallard @hughshanahan @BertR @perllaghu @colombod @sgibson91 - are you available 21 & 22 May? Looking at the calendar that's by far the best time to have this event at the Turing.

Can you leave a ๐Ÿ‘ or ๐Ÿ‘Ž reaction on this post indicating whether you can save those dates or not?

Other folks on the thread, please respond too, but I think we have more redundancy across (eg) Turing folks who can represent the various projects so I'd like to prioritise the folks who are not already well linked to Turing work.....

perllaghu commented 4 years ago

Issues 1: End user may have requirements that they would like their service providers to provider; end-users may be able to develop their own extensions, in which case how do they provide them back to their service provider and the wider community in an appropriate way (onboarding extensions contributors?) How do concerns of users who are far from the developer repos make their way to satisfied requirements/needs in terms of service development and offering? How do service providers share their customisations and learning back to the community?

Isn't this part of the perennial problem of any good service?

These are all challenges - and none of them are insurmountable.... just look at vscode, the linux kernel, open-office, heck - even git itself....

(Those of us who followed the EPrints software, will know the pains of trying to write an extensions manager :) )

trallard commented 4 years ago

Birthday workshop for me then ๐ŸŽ‚ I can look at dates where the reactor is 3 days in a row free too

psychemedia commented 4 years ago

Isn't this part of the perennial problem of any good service?

Yes, but I think that in Jupyter case there are lots of things that are quite easy to for folk to share back eg voila dashboard examples, simple magics and notebook extensions, things that are harder (anything in JupyterLab?!;-), things on the server side - deployment guides, customisations, server development.

To the extent that the w/s is about trying to encourage technical engagement, we can make virtue of the fact that Jupyter notebooks etc themselves work as an end user application development environment (folk using ipywidgets interact or stringing widgets together), and a medium that encourages simple package development (eg for magics, extensions), because these provide on-ramps for getting folk into a developer community. It may give folk enough confidence using Github as they manage their own code to contribute eg to docs maintenance on rather more technical projects that they would never feel comfortable with contributing to at a technical level?

perllaghu commented 4 years ago

I see a definite lack of "middle ground" docs.... There's stuff on APIs and there's high-level overview stuff.... but little in the practicalities of actually doing stuff.

Like Zero to jupyterhub-k8s.... (and its both hard & boring to write, I know)

KirstieJane commented 4 years ago

Woooo - ok - I've had a bash through @trallard's awesome application and added in a few more points.

Here's a PR with the version to be submitted: https://github.com/alan-turing-institute/the-turing-way/pull/773

KirstieJane commented 4 years ago

Submitted everyone!! Please save the dates 21 - 23 May and we'll let you know as soon as we know if the application was successful.

HUGE THANK YOU for all the great enthusiasm and support in such a short timeframe!

hughshanahan commented 4 years ago

Good luck !!


Hugh Shanahan Professor of Open Science Hugh.Shanahan@rhul.ac.ukmailto:Hugh.Shanahan@rhul.ac.uk http://www.shanahanlab.org @hughshanahan Skype hugh_shanahan Tel +44 (0)1784 443433 orcid.org/0000-0003-1374-6015

Emails sent outside of office hours are not an expectation that you should process this outside of office hours.

On 15 Dec 2019, at 19:23, Kirstie Whitaker notifications@github.com<mailto:notifications@github.com> wrote:

Submitted everyone!! Please save the dates 21 - 23 May and we'll let you know as soon as we know if the application was successful.

HUGE THANK YOU for all the great enthusiasm and support in such a short timeframe!

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/alan-turing-institute/the-turing-way/issues/770?email_source=notifications&email_token=AAZO67LPU5OYVECIKEQJSE3QYZ747A5CNFSM4JZTM3SKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG5AF6A#issuecomment-565838584, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAZO67OSRMQD2VDXFD4TI3DQYZ747ANCNFSM4JZTM3SA.