[5] Identify bottlenecks in the component analysis

tisnik commented 7 years ago

Description

Update current logging/tooling to be able to identify bottlenecks in the component analysis, both for known component(s) and for unknown (non-existing) components as well

Acceptance criteria

[ ] perform API call for component analysis for known component and ecosystem, optionally with debug/verbose parameter
[ ] durations for (all) processes and steps can be retrieved via Kibana or via the stack report itself
[ ] dtto for the unknown component

Related issues

Task for the epic #1144

msrb commented 7 years ago

I am already looking into this.

msrb commented 6 years ago

My findings and observations:

component-analysis level

requests are mostly dependent on graph response times and our ability to schedule flows quickly
graph is doing mostly fine, with only occasional response times > 1 second (proper load test is yet to be performed though)
- see Kibana logs for more details: http://red.ht/2yDqlxX
we run "bookkeping" flow we each successful graph query, this is questionable
we automatically schedule analysis of all unknown components; again, this is questionable

There was one issue when we were initializing Selinon and Celery over and over again in basically every user request. This could sometime take multiple seconds and therefore very significantly prolong response time. I partially fixed the issue, but we still need to initialize celery in each request - this needs further investigation.

We are scheduling "bookkeeping" flow for each successful request. It runs asynchronously and it doesn't block the request. But if the only purpose of this flow is to store the bookkeeping data in RDS, then doing this directly in the context of request could save us some precious worker resources that can be used for stack analyses instead. And talking to RDS directly could be less time consuming than scheduling flows.

TL;DR:

mostly OK, but proper load test needs to be performed
we need to rethink bookkeeping flow and how we handle unknown components
- if we decide to schedule flows in requests, we need to make sure we can do that quickly (not to initialize celery all the time)

miteshvp commented 6 years ago

We are scheduling "bookkeeping" flow for each successful request.

@msrb - +1. During our very initial discussion with Slavek, it was more leaning towards separation of server and worker, but I guess we should re-think that now, since stack-analyses gets the data directly from RDS, and I see no reason why it can't directly persist the same in RDS as book-keeping. cc @samuzzal-choudhury

samuzzal-choudhury commented 6 years ago

@msrb @miteshvp database operation will add up to the total response time. That was reason behind doing bookkeeping async way.

openshiftio / openshift.io