softwaresaved / habeas-corpus

A corpus of research software used in COVID-19 research.
MIT License
5 stars 4 forks source link

Pull together the normalization information in a dataset ⛙ #3

Open sdruskat opened 3 years ago

sdruskat commented 3 years ago

What do we have?

The issue

We need some sort of dataset to count mentions according to #2.

What do we really need?

There are several ways this could look:

How can we achieve this?

olexandr-konovalov commented 3 years ago

+1 for Jupyter. Can have fully automated and reproducible analysis which downloads the CSV file (or has a refined dataset in the repository) and allows to re-run it on Binder: https://github.com/rse-standrewscs/python-binder-template

olexandr-konovalov commented 3 years ago

Still some code should in in .py files, easier to keep under version control, test etc.

Obligatory reading is https://doi.org/10.1371/journal.pcbi.1007007

There is also a tool for diffing and merging Jupyter notebooks: https://nbdime.readthedocs.io/