openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Splitting/Refactoring of stack-analysis repository #1652

Open pkajaba opened 6 years ago

pkajaba commented 6 years ago

This task is created after agreement on Mattermost channel. me @rootAvish and @sara-02 agreed that there is way too much functionality inside https://github.com/fabric8-analytics/fabric8-analytics-stack-analysis repository.

My suggestion would be to 1) outline what components are inside, 2) consider whether those components are still relevant for us, 3) rearrange relevant components and 4) refactor these components.

I will be thankful for any input especially from @rootAvish and @sara-02 since they are specialists for this repository right now.

rootAvish commented 6 years ago

As far as the relevancy of the components goes there is very little dead code.

pkajaba commented 6 years ago

sounds really good, but

Alternate recommendation - Are being moved outside for reasons other than refactoring

what are those reasons?

companion recommendation - needs to stay, driven by the same component as outliers

can't we extract it into the lib, so outliers and companion will just reuse it?

I have one question. What actually Kronos is? Alternate recommendation + Outlier recommendation + companion recommendation together? Some diagrams how analytics pipeline is currently running would be great.

Btw, I have another proposal. We could unify naming because at this point there folder structure: analytics_platform/kronos and inside this path there apollo, gnosis, pgm and softnet + src folder.

I am sure that you can map these codenames to real components, but we should make better documentation or just rename it. I prefer creating documentation since those names are cool :-).

rootAvish commented 6 years ago

what are those reasons?

Alternate recommendations don't actually use the PGM structure, rather are based off a Jaccard distance metric based on the tags. However right now we can't increase the tag count for the similarity metric because the PGM cannot accommodate so many tags on a single package. By moving it outside the PGM we'll draw it from the unfiltered package tag map(containing more than four tags per package) enabling better calculation of the similarity score and in turn of alternates.

What actually Kronos is? Alternate recommendation + Outlier recommendation + companion recommendation together? Some diagrams how analytics pipeline is currently running would be great.

Yes.

gnosis - This is generation of reference architecture. softnet - the packages are added as leaf nodes to the reference architecture graph, also contains something called a similarity dict, used to drive outliers. pgm - The actual pomegranate model, it is trained using what we generated in gnosis and softnet.

rootAvish commented 6 years ago

There's some documentation that I wrote as a part of knowledge transfer sessions here- https://docs.google.com/document/d/1f6dgwvf44kTTbZ1ascvbcxZoTvC4wVq7K5e6PkeHWJo/edit

EDIT: Actually, the diagram in it is not all too accurate. Need to fix it.

pkajaba commented 6 years ago

@rootAvish Great resource! Thank you. However, let's migrate information from the document into a repository, so it will be in one place.