jupyterlab / jupyterlab-data-explorer

First class datasets in JupyterLab
BSD 3-Clause "New" or "Revised" License
178 stars 38 forks source link

User stories for data registry #139

Closed ellisonbg closed 1 year ago

ellisonbg commented 4 years ago

We have been having conversations with different users and orgs about what the data registry would need to provide to be useful to them. Opening an issue to track this.

ellisonbg commented 4 years ago

Here is a fantastic blog post from the UK Met office with relevant user stories:

https://medium.com/informatics-lab/what-do-we-want-from-a-dream-data-platform-as-a-service-c38558c25f29

tgeorgeux commented 4 years ago

Here's a quick summary,

Analyst:

Someone who wants to extract some information from data, for instance a research scientist. Often very big data. They are code proficient in their domain but not necessarily up to speed on how to use and configure the myriad of cloud services. They write code as a means to an end, the less time thinking about code and computers and the more time thinking about their problem domain the happier and more productive they will be.

1

“As an analyst, I want an easy point-and-click based user interface (UI) which allows me to access the power features (see below!) so that I can concentrate on gaining insight from my data and not spend time learning a new system/language/service/what cloud computing means.”

2

“As an analyst working with big data, I need to be able scale my compute horizontally (more computers) and vertically (bigger/faster computers) so that I can perform my analysis in a reasonable timeframe.”

3

“As an analyst, I need to be able to quickly create and publish user friendly applications so that I can push new discoveries from research to applied science/operations/the real world.”

4

“As an analyst, I need to be able to share and publish notebooks so that I can work on problems with colleagues.”

5

“As an analyst, I need to be able to control the rate I spend my allocated funding/resource, so that I can choose to spend it effectively.”

6

“As an analyst, I need a way to easily browse, understand and load datasets into my analysis session so that I use the best data available and spend more time doing actual science.”

7

“As an analyst, I need to effortlessly work with various compute estates which are coupled to datasets, so that I do not have to change platform when I use a new dataset.”

8

“As an analyst, I need to be able to submit and monitor long running tasks to a robust compute service, so that I can complete analysis that takes too long to actively monitor.”

9

“As an analyst, I want to customise my Jupyter instance with the tools and extensions I find useful, so I can be most productive.”

10

“As an analyst, I want to customise my software stack so that I can access the most useful specialist tools.”

11

“As an analyst who provides information to others (be that scientific publication, or consultancy with decision makers), I need to be able to prove the chain of analysis, so that I can: evidence my conclusion; analyse my level of confidence; correct mistakes; apportion revenue up the chain.”

12

“As a research software engineer, I want to be able to seamlessly move between writing expressive interactive science analysis, and writing more traditional software so that I can develop high quality, powerful tools.”

Data generator:

Someone who produces data and datasets, which are often very large. They want people inside or outside the organization to find, understand, and use their data. They understand their data and know how to work with and manipulate it.

1

“As a data generator, I need to be able to publish my datasets so that the appropriate people can discover and use them.”

2

“As a data generator, I need to be able to evidence what my data is being used for, so I can justify my funding.”

3

“As a data generator, I need tools for publishing many meaningless chunks of data as a meaningful dataset which can be used by other humans.”

System administrator:

They design and maintain the system. They make sure it stays up, develop and improve it, make sure it’s safe, secure, cost-effective and fit for purpose. It is their business to know and understand how complex systems fit together and work. They want to understand the performance and cost of a system, and where these characteristics are coming from.

1

“As a system administrator, I want a back-stop on how much money my users spend, so that they don’t accidentally run up large bills.”

2

“As an enterprise company with a large and complex existing data estate, I need access to a series of well decoupled services, so that I can combine them with existing services.”

Thanks for bringing this up @ellisonbg did you have any specific cases you wanted to highlight or discuss?