Open sdruskat opened 3 years ago
+1 for Jupyter. Can have fully automated and reproducible analysis which downloads the CSV file (or has a refined dataset in the repository) and allows to re-run it on Binder: https://github.com/rse-standrewscs/python-binder-template
Still some code should in in .py
files, easier to keep under version control, test etc.
Obligatory reading is https://doi.org/10.1371/journal.pcbi.1007007
There is also a tool for diffing and merging Jupyter notebooks: https://nbdime.readthedocs.io/
What do we have?
The issue
We need some sort of dataset to count mentions according to #2.
What do we really need?
There are several ways this could look:
the information that
Software1
andSoftware2
are both mentioned in this paper, even ifSoftware1
was actually mentioned assoftware one
orSW 1
, and perhaps the count of each mention per paperHow can we achieve this?