Create Submission Overview Dashboard using flexdashboard

tschaffter commented 3 years ago

Create a dashboard that includes statistics about the tools evaluated in the NLP Sandbox. Example of metrics include:

General stats:

Number of unique submitters
Number of tasks / benchmarks open
Latest version of the NLP Sandbox schemas
Number of datasets / data sites
Number of unique tools (e.g. using the Tool.name)
Programming languages used by the tools (would need to add Tool.language)
Other

Notes:

Same metrics as above but for specific tasks.
The dashboard is an HTML page that can be generated periodically on GitHub and stored on GitHub Pages and Synapse (if required to include the page in a Synapse Wiki page)
Another dashboard could be created for the Monument of NLP Heroes

References

GA4GH-DREAM Workflow Execution Challenge - Participation Overview

tschaffter commented 3 years ago

The dashboard will be developed in this repository.

tschaffter commented 3 years ago

@jiaxinmachine88 Could you please create a document to track what we want to go in this participation dashboard? Maybe a google docs with a section for each metrics. Each section should provide at least the following information:

Name of the metric (e.g. Tools)
Description of the metric (e.g. Number of unique tools)
How to compute the metric value (e.g. From submission X, count the number of submissions with a unique tool name)

andrewelamb commented 3 years ago

I'm going with these assumptions for now, feel free to correct:

"Number of unique submitters"

unique(submitterid)

"Number of tasks / benchmarks open"

Could you elaborate? Something to do with the status column?

"Latest version of the NLP Sandbox schemas"

Could you elaborate?

"Number of datasets / data sites"

unique(dataset_name) Could you elaborate on data sites?

"Number of unique tools (e.g. using the Tool.name)"

unique(tool_name)

"Programming languages used by the tools (would need to add Tool.language)"

I assume this will be added?

tschaffter commented 3 years ago

@andrewelamb

Number of tasks / benchmarks open

We could use the number of unique evaluation ID listed in this table (left side).

Latest version of the NLP Sandbox schemas

The services and tools of the NLP Sandbox are based on the NLP Sandbox Schemas. The latest version number could be retrieved from GitHub API. Alternatively, I'm OK adding a file .nlpsandbox-version to this repo with the schemas version x.y.z. Here is an example of this file hosted in another GH repo.

"Number of datasets / data sites"

When an NLP developer submits a tool, this tool is evaluated on data hosted at different physical location (data sites). Currently there are two data sites enabled (Sage and Medical College of Wisconsin (MCW)). For now we can use a static value (2).

The number of datasets can be obtained from the above table as length(unique(dataset_name)).

"Programming languages used by the tools (would need to add Tool.language)" I assume this will be added?

Yes

andrewelamb commented 3 years ago

Screenshot from 2021-08-20 09-43-26

andrewelamb commented 3 years ago

Screenshot from 2021-08-20 10-05-54

tschaffter commented 3 years ago

@andrewelamb The number of tasks open (.i.e. evaluation queues) is incorrect. I could 6 evaluations queues on the table page. The rest looks good!

andrewelamb commented 3 years ago

@tschaffter Yep, I was grabbing the wrong column. Thta's been fixed.

andrewelamb commented 3 years ago

@tschaffter Would it be all right to use @thomasyu888's credentials for the docker image? He would need read permission to the source table and edit permission to the directory we would want to store the html output file.

tschaffter commented 3 years ago

@andrewelamb Where is the Docker container going to run? We have a bot account that we can use for this task. I'll create the token and share it with you using our favorite password manager.

andrewelamb commented 3 years ago

@tschaffter, I don't have an answer to either of those questions. :) @thomasyu888 ?

thomasyu888 commented 3 years ago

@tschaffter It will run on our kubernetes cluster and it will need to be a Synapse PAT with download permissions.

tschaffter commented 3 years ago

Sounds good!

gkowalski commented 3 years ago

See https://github.com/Sage-Bionetworks/SynapseWorkflowOrchestrator/issues/31

tschaffter commented 3 years ago

Closing this issue in favor of smaller issues to be created in repository of the dashboard: https://github.com/nlpsandbox/participation-dashboard

nlpsandbox / nlpsandbox.io

Create Submission Overview Dashboard using flexdashboard #152

References