layne-sadler commented 3 years ago

Background

In order to facilitate roadmapping next week, here is a quantified ranking of user pain points.

People only use a product if it solves a problem that they actually have; therefore, if what you are building isn't solving a problem... then why bother building it? Ask yourself 'what problem does this solve?' and 'who has this problem?'

Most of the solutions to pain points are obvious, but some require engineering creativity to come up with the best solution. For example, how would you decide to solve the problem of "losing data due to kernel failure/ restart"?

Ranked Pain Points

ranked_weighted_pain

Weighting

It was easy to weight them because users were already asked to weight them as seen below:

{
    "critical": "4x",
    "major": "3x",
    "minor": "2x",
    "trivial": "1x"
}

goanpeca commented 3 years ago

Thanks for working on this @layne-sadler !

layne-sadler commented 3 years ago

use_case_frequency

Weighting

Used powers of 3 to simulate a day vs a week vs a few months.

{
    "daily": "81x",
    "weekly": "27x",
    "monthly": "9x",
    "every few months": "3x"
}

layne-sadler commented 3 years ago

jupyter_vs_alternatives

Weighting

Users were asked if Jupyter satisfied their needs (yes, neutral, no) and then asked the same question for their alternative tools.

(jupyter & yes: +1x,
jupyter & no: -1x)

subtract

(alternative & yes: +1x,
alternative & no: -1x)
)

When compared w the previous chart you can see that pipelines, documenting research, and writings tests need improvement.

layne-sadler commented 3 years ago

So what should we do?

Let's brainstorm about it

Research is both an industrial and academic topic. There’s an entire scientific community waiting to blossom to it’s full potential here; Twitter meets arxiv w graph-based citation NLP. The network effect (quadratic multiplier) of a scientific community is how you beat VS Code & Google Colab.

krassowski commented 3 years ago

I would like to make a point about ecosystem on the academic front by first showcasing some small features I miss:

1: tabular editor (csv/tsv files):
- jupyterlab-tabular-data-editor: not ported to Lab 3.0, looks like never fully released :(
2: citing software integration (e.g. zotero):
- there used to be an extension for jupyter notebook - zotero integration: cite2c not maintained for 3 years now, never ported to jupyter lab :(
- RStudio recently got nice zotero integration
3: RMarkdown renderer: the format being also the alternative notebooks format (developed in RStudio) is a de-facto standard for many use cases (writing papers/presentations etc) in biomedical and biological sciences. It would be perfect to be able to at least render it in JupyterLab instead of having to switch RStudio (and jumping through hoops)
4: explorer of scientific symbols (there is latex in markdown but I still need to find the correct name of the symbol if it is one that I use less frequently)
5: mixing of variables with markdown (i.e. embedding variables in markdown)
- as recently pointed out, this is on whishlist for over 7 years now: https://github.com/ipython/ipython/issues/2958#issuecomment-701694482
- a solution for jupyter notebook exists (but for Python only): python-markdown
- this feature is available in RMarkdown
6: Slides/presentations
- RISE: https://github.com/damianavila/RISE/issues/270, at this pace the crowdfunding will get us to one month of RISE work in.. checks notes... six and a half years
- while JupyterLab does have an option to export to Reveal.js, the workflow is not great and RStudio has many more options available.

I think that points 1 to 4 should be extensions; while there are some good extensions for ML/DL users (e.g. Elyra stack), it seems that fewer extensions exist for the typical scientific use cases. While JupyterHub-like environments (e.g. GenePattern, DNAnexus - commercial) sometimes include dedicated field-specific JupyterLab extensions, it seems like there is not really much of an ecosystem for extensions dedicated to academics.

Question: could we encourage sustainable development of new extensions for academic users of JupyterLab?

Problems:

JupyterLab has CORPORATE.md but not ACADEMIC.md. Research labs are not corporations.
most scientists never heard of typescript or Node.js; how can we encourage those who do to contribute?
many labs will not pay for developing software that slightly improves their productivity (especially if it is only beneficial on a global scale once multiplied by thousands of users, and not for the lab itself);
- large research labs and universities in general could however buy a product/support that benefits them (and this is how RStudio funding works);
there are Research Software Engineering teams but those can have very different priorities and my guess is that they will not act on such details unless the reward is substantial (i.e. plugin that benefits a large userbase of a large university; the return would be lower for more common small universities)
using the analogy from the CORPORATE.md the, in academic world the person that cared for the puppies in no more than two years will be in a different lab, likely in a different country. How to keep them engaged?

Sub-questions:

what factors would potentially encourage individuals or research labs to contribute?
- should we seek to incentivise contributions from research labs or individual researchers (as people often change labs?)
is maybe advocating for RSE funding the right way to move forward?
is the CZI-like approach the way forward?
is using the academic currency of publications applicable? Would researchers be interested in having a publication unrelated to their field of research if it was about developing new features for general-purpose, yet science-focused editor?

The points 2 and 5 demonstrate a regression in the ecosystem: features available in the past are no longer maintained and no replacement is available. They also showcase areas where specialized alternatives (RStudio) invested in innovation leaving Jupyter behind.

As for promoting Jupyter notebooks among academics, I think that pieces like this one do help, and a tutorial on features specific to JupyterLab (and relevant to academics) in F1000 or Some Journal Methods could be useful too.

Edit: To highlight, I have very little academic experience, and while this is the best picture of the situation that I have as a PhD student, it might turn out to be neither comprehensive, nor accurate. I also have no insight to how Jupyter project unfolded and managed to become such a great and successful platform - I just wish to support it and also have the nice features that would make my day work easier.

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/participate-in-the-jupyter-2021-survey/7415/7

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-annual-survey-results/7388/5

layne-sadler commented 3 years ago

Here is an example of improving "documenting research" capabilities:

Similar to a table of contents a notebook could have a table of figures. The figures are typically the best part of a paper/ report (contain the most insight) so people want to jump to them after reading the abstract.

https://www.nature.com/articles/s41587-020-0673-2

TK-21st commented 3 years ago

@krassowski I think you hit the nail on the head by identifying the core issue contributing to the lack of robust ecosystem in academics, which is the difficulty in aligning the incentivize of academics with developing better software tools.

Here are some of my scattered thoughts at the moment:

The first thing (and the most important in my opinion) is to recognize software development as valid components of one's academic portfolio. Journals are getting better in recognizing such efforts (which I benefited from) but more needs to be done. Perhaps there is a way for Project Jupyter to take an effort to highlight some academic projects (extensions) and thus amplify these projects' legitimacy as genuine research efforts.
I also think that encouraging JupyterLab or JupyterHub as collaborative working environment for research labs will be very important. Even for my lab where we manage our own servers, we have reservations with using JupyterHub. One of the issues is that JupyterHub requires HTTP access it seems but we don't have enough know-how regarding web security to support that and rely completely on SSH. It will be wonderful if there could be some alternative to HTTP for using JupyterHub (perhaps SSH port forwarding? not sure) This will do wonders for communicating results both internally within the lab and externally.
Lowering the barrier of entry is definitely going to be helpful. It would be fantastic if there was a better cookiecutter for some simpler widgets that are targeting academic usages but I'm not sure how we would make this general to all academics be honest.

krassowski commented 3 years ago

I just watched a great talk by @choldgraf on 2i2c; while all of the talk is relevant, the 10 minutes of discussion starting at 42:10 are very relevant to the discussion I started above (tackling the issue of academic culture of ownership, and the RStudio model).

Should we move the discussion on incentivising academic contributions to another thread?

choldgraf commented 3 years ago

(Am more than happy to chat about 2i2c in a different thread, it is definitely meant to help with some of the problems you correctly identify)

jupyterlab / frontends-team-compass

Ranked pain points to help w roadmap #121

Background

Ranked Pain Points

Weighting

Weighting

Weighting

So what should we do?

Let's brainstorm about it