Closed rootAvish closed 6 years ago
https://docs.google.com/a/redhat.com/spreadsheets/d/1bGq1z4NpzGdPcDFl0MZ-i3qNxECej-aWe2ZtLYFwK8k/edit?usp=sharing <-- numbers from first tagging run
Future scope: Add some parallelization logic and run subsequent runs of the tagger.
Epic: #1248
User Story
The PGM will be able to get automated tags for NPM packages so that it can be trained to show alternate and companion recommendations for the NPM ecosystem.
Description
In order to show recommendations for the NPM ecosystem, we need to train our current recommendation engine(the PGM) using a package tag map and a set of reference manifest files(reference
package.json
) of that ecosystem. In order to create the package tag map we need to have tags for the packages of the ecosystem for which we are training the PGM.This tasks involves furthering the work done in the previous sprint to tag NPM packages automatically to generate the package tag map. We will containerize the tagging script and push it to prod to collect tags for all the NPM packages for which we had previously collected the Github readme/project description page.
We will continue this process iteratively by evaluating the accuracy of the tags and re-running the tagger with improvements till we attain optimal accuracy.
Acceptance Criteria
json
in the corresponding S3 bucketTask list