Server revamp - Githubissues

li-wu commented 5 years ago

I saw a lots of coding format issue (spacing, unused imports, line length etc.) on PyCharm. Before we have lint in CI job, I think we should probably pay attention to it in each PR.

li-wu commented 5 years ago

We should also change related docs to reflect the changes: http://127.0.0.1:4000/SETUP.html http://splunk.github.io/eventgen/ARCHITECTURE.html

jmeixensperger commented 5 years ago

I saw a lots of coding format issue (spacing, unused imports, line length etc.) on PyCharm. Before we have lint in CI job, I think we should probably pay attention to it in each PR.

Not sure I agree here. Doing manual linting is very error-prone and it will slow down our development process.

li-wu commented 5 years ago

I saw a lots of coding format issue (spacing, unused imports, line length etc.) on PyCharm. Before we have lint in CI job, I think we should probably pay attention to it in each PR.

Not sure I agree here. Doing manual linting is very error-prone and it will slow down our development process.

we do have make format command to do this before committing. You do not need to to it manually.

ryanfaircloth commented 5 years ago

There is something wrong with CI, while checks report passed on the PR 22 failures occurred if you inspect the job on circle

ryanfaircloth commented 5 years ago

It looks like the docker image is redis based, normal practice with docker is "single thing" why are we not using redis in its own container and building eventgen as a separate container

li-wu commented 5 years ago

I push one commit to fix the failed test cases.

li-wu commented 5 years ago

@arctan5x , either commit be3ca13 or 804bd6f introduced the three test cases failed: https://circleci.com/gh/splunk/eventgen/693

li-wu commented 5 years ago

There is something wrong with CI, while checks report passed on the PR 22 failures occurred if you inspect the job on circle

CircleCI works like if you have a dedicated test step which returns none-zero code it will report FAILED status. Since the whole test cases are running in a container and the test flow always return zero status even it has failed test cases. We may need to find a better to deal with this.

arctan5x commented 5 years ago

At this point, all tests are fixed except for 2 failures which seem like a real bug, so I will investigate it tomorrow.

YifengMao commented 5 years ago

tested the latest code, the process# issue is fixed. but the performance(20 prcoesses) on es bundle is likely 1/3 compares to 6.3.4. So the performance issue is not related to number of processes.

That is to say, before the process# fix, the performance (80 processes in this code base) is 70% of 20 processes in 6.3.4 on the same hardware, if we reduce the process# to 20, the performance is event worse.

the performance issue is not related to the server change, because it can be reproduced in CLI mode. and I saw plenty of logs from httpevent outputter: 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'}

it might be the root cause of why we have such critical throughput loss

arctan5x commented 5 years ago

@YifengMao I think it is a good point. Let's check the http outputter.

li-wu commented 5 years ago

@arctan5x you mentioned in last sync up meeting that "Missing the logic where Eventgen Controller needs to know dropped servers", does it covered now?

arctan5x commented 5 years ago

@li-wu We don't have it still, but that feature only affects /index endpoint which is very low impact and no functional effect. So going to release this then add it as a backlog.

splunk / eventgen

Server revamp #266