openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

[8] Automated Tagging and Validation of tags for all packages in NPM ecosystem #1268

Closed rootAvish closed 6 years ago

rootAvish commented 7 years ago

Epic: #1248

User Story

The PGM will be able to get automated tags for NPM packages so that it can be trained to show alternate and companion recommendations for the NPM ecosystem.

Description

In order to show recommendations for the NPM ecosystem, we need to train our current recommendation engine(the PGM) using a package tag map and a set of reference manifest files(reference package.json) of that ecosystem. In order to create the package tag map we need to have tags for the packages of the ecosystem for which we are training the PGM.

This tasks involves furthering the work done in the previous sprint to tag NPM packages automatically to generate the package tag map. We will containerize the tagging script and push it to prod to collect tags for all the NPM packages for which we had previously collected the Github readme/project description page.

We will continue this process iteratively by evaluating the accuracy of the tags and re-running the tagger with improvements till we attain optimal accuracy.

Acceptance Criteria

Task list

rootAvish commented 6 years ago

https://docs.google.com/a/redhat.com/spreadsheets/d/1bGq1z4NpzGdPcDFl0MZ-i3qNxECej-aWe2ZtLYFwK8k/edit?usp=sharing <-- numbers from first tagging run

rootAvish commented 6 years ago

Future scope: Add some parallelization logic and run subsequent runs of the tagger.