openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Periodic tags/keywords gathering #724

Closed fridex closed 6 years ago

fridex commented 7 years ago

Description

In order to create fully automatic tagging pipeline, we would like to automatically collect tags for various ecosystems. Some of these collectors are implemented in tagger, some of them are planned to be implemented (e.g. https://github.com/openshiftio/openshift.io/issues/712, https://github.com/openshiftio/openshift.io/issues/710).

These collectors should be run periodically and automatically gather tags from external resources. Probably the best start would be to create tasks for gathering tags that would use tagger library API to call tag/keyword collectors. These tasks could be grouped to Selinon flow which would be periodically run from jobs service based on YAML configuration.

Topics should be gathered and placed on GitHub tags repo or S3 to automatically use them for feeding PGM model. As of now topics are stored on GitHub, but it is probably not that suitable for tagging pipeline - discuss possible movement to S3 (with implementation of appropriate topics S3 adapter in analytics core).

One of the last tasks in topics gathering job should aggregate topics to compute synonyms and drop less relevant keywords. This should be done using tagger library API call and should be transparently configurable on source code level as there are expectations to tweak parameters based on overall tagging results. Topics aggregating should be done per ecosystem (ecosystem specifc tags) and each ecosytem specific aggregation should be done in a separate task.

Acceptance criteria

miteshvp commented 6 years ago

Not relevant. Closing.