softwaresaved / habeas-corpus

A corpus of research software used in COVID-19 research.
MIT License
5 stars 4 forks source link

Find out if the seed data packages are publicly available, and annotate them respectively 🔗 #5

Open sdruskat opened 3 years ago

sdruskat commented 3 years ago

What do we have?

A seed dataset of n software package mentions.

The issue

Just the mentions aren't useful for most of our research questions.

What do we really need?

How can we achieve this?

  1. Crowdourcing! We each of us take a list of mentions and try to find the public repository on, e.g., GitHub, GitLab, Bitbucket, elsewhere.
  2. We annotate the dataset with this information.