Open layne-sadler opened 3 years ago
Thanks for working on this @layne-sadler !
Used powers of 3 to simulate a day vs a week vs a few months.
{
"daily": "81x",
"weekly": "27x",
"monthly": "9x",
"every few months": "3x"
}
Users were asked if Jupyter satisfied their needs (yes, neutral, no) and then asked the same question for their alternative tools.
(jupyter & yes: +1x,
jupyter & no: -1x)
subtract
(alternative & yes: +1x,
alternative & no: -1x)
)
When compared w the previous chart you can see that pipelines, documenting research, and writings tests need improvement.
Research is both an industrial and academic topic. There’s an entire scientific community waiting to blossom to it’s full potential here; Twitter meets arxiv w graph-based citation NLP. The network effect (quadratic multiplier) of a scientific community is how you beat VS Code & Google Colab.
I would like to make a point about ecosystem on the academic front by first showcasing some small features I miss:
I think that points 1 to 4 should be extensions; while there are some good extensions for ML/DL users (e.g. Elyra stack), it seems that fewer extensions exist for the typical scientific use cases. While JupyterHub-like environments (e.g. GenePattern, DNAnexus - commercial) sometimes include dedicated field-specific JupyterLab extensions, it seems like there is not really much of an ecosystem for extensions dedicated to academics.
Question: could we encourage sustainable development of new extensions for academic users of JupyterLab?
Problems:
CORPORATE.md
but not ACADEMIC.md
. Research labs are not corporations.CORPORATE.md
the, in academic world the person that cared for the puppies in no more than two years will be in a different lab, likely in a different country. How to keep them engaged?Sub-questions:
The points 2 and 5 demonstrate a regression in the ecosystem: features available in the past are no longer maintained and no replacement is available. They also showcase areas where specialized alternatives (RStudio) invested in innovation leaving Jupyter behind.
As for promoting Jupyter notebooks among academics, I think that pieces like this one do help, and a tutorial on features specific to JupyterLab (and relevant to academics) in F1000 or Some Journal Methods could be useful too.
Edit: To highlight, I have very little academic experience, and while this is the best picture of the situation that I have as a PhD student, it might turn out to be neither comprehensive, nor accurate. I also have no insight to how Jupyter project unfolded and managed to become such a great and successful platform - I just wish to support it and also have the nice features that would make my day work easier.
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/participate-in-the-jupyter-2021-survey/7415/7
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/jupyter-annual-survey-results/7388/5
Here is an example of improving "documenting research" capabilities:
Similar to a table of contents a notebook could have a table of figures. The figures are typically the best part of a paper/ report (contain the most insight) so people want to jump to them after reading the abstract.
@krassowski I think you hit the nail on the head by identifying the core issue contributing to the lack of robust ecosystem in academics, which is the difficulty in aligning the incentivize of academics with developing better software tools.
Here are some of my scattered thoughts at the moment:
I just watched a great talk by @choldgraf on 2i2c; while all of the talk is relevant, the 10 minutes of discussion starting at 42:10 are very relevant to the discussion I started above (tackling the issue of academic culture of ownership, and the RStudio model).
Should we move the discussion on incentivising academic contributions to another thread?
(Am more than happy to chat about 2i2c in a different thread, it is definitely meant to help with some of the problems you correctly identify)
Background
In order to facilitate roadmapping next week, here is a quantified ranking of user pain points.
Most of the solutions to pain points are obvious, but some require engineering creativity to come up with the best solution. For example, how would you decide to solve the problem of "losing data due to kernel failure/ restart"?
Ranked Pain Points
Weighting
It was easy to weight them because users were already asked to weight them as seen below: