Decide on how to count software mentions 🧮

sdruskat commented 3 years ago

What do we have?

If #1 is finished, we should have a dataset with unified names (identifiers) for a software package that may have been used across many different papers.

The issue

We don't know how to count mentions, i.e.

do we count software usage per work (software is counted once per paper), or
do we count software usage per mention (software is counted however many times it is mentioned per paper).

For example, if these are the software mentions for a single paper: ['Ribo', 'Ribo', 'Ribo', 'Seq'] (cell I28)

is Ribo counted once? n = 1
is Ribo counted thrice? n = 3

What do we really need?

[ ] A decision on how to count.

How can we achieve this?

Per ~~discussion~~ poll in this comment.

olexandr-konovalov commented 3 years ago

I suggest Per work. That matches what happens when one counts number of citations of a paper - even if citation [n] appears many times in the next, this will be one citations of the Nth item in the list of references.

sdruskat commented 3 years ago

Let's do a poll!

If you think that we should count mentions per work, react to this comment with :+1:. (R, R, R = 1)
If you think we should count every single mention, react to this comment with :-1:. (R, R, R = 3)

You can react by clicking the smiley button in the upper right hand corner of this comment.

npch commented 3 years ago

The decision is mentions per work - this has been added in 1b809fcc592b82d8687e31d3aa61927fad50320d

softwaresaved / habeas-corpus