softwaresaved / habeas-corpus

A corpus of research software used in COVID-19 research.
MIT License
5 stars 4 forks source link

Decide on how to count software mentions 🧮 #2

Closed sdruskat closed 3 years ago

sdruskat commented 3 years ago

What do we have?

If #1 is finished, we should have a dataset with unified names (identifiers) for a software package that may have been used across many different papers.

The issue

We don't know how to count mentions, i.e.

For example, if these are the software mentions for a single paper: ['Ribo', 'Ribo', 'Ribo', 'Seq'] (cell I28)

What do we really need?

How can we achieve this?

Per discussion poll in this comment.

olexandr-konovalov commented 3 years ago

I suggest Per work. That matches what happens when one counts number of citations of a paper - even if citation [n] appears many times in the next, this will be one citations of the Nth item in the list of references.

sdruskat commented 3 years ago

Let's do a poll!

You can react by clicking the smiley button in the upper right hand corner of this comment.

npch commented 3 years ago

The decision is mentions per work - this has been added in 1b809fcc592b82d8687e31d3aa61927fad50320d