ourresearch / depsy

Track the impact of research software.
http://depsy.org
MIT License
190 stars 11 forks source link

Question: how to improve guessing (research software or not)? #104

Closed pawamoy closed 5 years ago

pawamoy commented 7 years ago

Here are simple things I thought of to improve the guessing as research software, please tell us if they actually impact the guessing or not (and add the ones you know/thought of as well :smile:):

Note: I couldn't find any docs on how Depsy is guessing that, but maybe I missed something?

andim commented 7 years ago

To give a concrete example depsy does not classify Noisyopt (https://github.com/andim/noisyopt) as research software http://depsy.org/package/python/noisyopt

And this despite Intended Audience :: Science/Research tag on pypi and clearly scientific aim.

If anyone has an idea of what is going, let me know!

pawamoy commented 7 years ago

Alright, I'm taking a look at the source code.

Intended Audience :: Science/Research classifier should work. If it's not the case (and we see that with your Noisyopt package), I suspect this line

if package_obj.intended_audience == "Science/Research":

to return False because of a trailing new-line (maybe), or the intended_audience property to return None:

try:
    pypi_classifiers = self.api_raw["info"]["classifiers"]
except KeyError:
    return None

I also see in the code that a list of 'sciency' keywords are used to tag a package as academic, see the list here: https://github.com/Impactstory/depsy/blob/master/models/academic.py#L29.

However, these keywords are tested against a truncation of the 100 first characters of the package description:

self.summary = truncate(self.api_raw["info"]["description"])

They also are tested against the package's name and tags obtained through setup's keywords and classifiers information.

So here is the updated list of criteria:

The unchecked criteria could serve as a TODO-list :smile:

pawamoy commented 5 years ago

Closing because project is not maintained anymore.