Closed arctan5x closed 5 years ago
We should also change related docs to reflect the changes: http://127.0.0.1:4000/SETUP.html http://splunk.github.io/eventgen/ARCHITECTURE.html
I saw a lots of coding format issue (spacing, unused imports, line length etc.) on
PyCharm
. Before we havelint
in CI job, I think we should probably pay attention to it in each PR.
Not sure I agree here. Doing manual linting is very error-prone and it will slow down our development process.
I saw a lots of coding format issue (spacing, unused imports, line length etc.) on
PyCharm
. Before we havelint
in CI job, I think we should probably pay attention to it in each PR.Not sure I agree here. Doing manual linting is very error-prone and it will slow down our development process.
we do have make format
command to do this before committing. You do not need to to it manually.
There is something wrong with CI, while checks report passed on the PR 22 failures occurred if you inspect the job on circle
It looks like the docker image is redis based, normal practice with docker is "single thing" why are we not using redis in its own container and building eventgen as a separate container
I push one commit to fix the failed test cases.
@arctan5x , either commit be3ca13
or 804bd6f
introduced the three test cases failed: https://circleci.com/gh/splunk/eventgen/693
There is something wrong with CI, while checks report passed on the PR 22 failures occurred if you inspect the job on circle
CircleCI works like if you have a dedicated test
step which returns none-zero code it will report FAILED
status. Since the whole test cases are running in a container and the test flow always return zero status even it has failed test cases. We may need to find a better to deal with this.
At this point, all tests are fixed except for 2 failures which seem like a real bug, so I will investigate it tomorrow.
tested the latest code, the process# issue is fixed. but the performance(20 prcoesses) on es bundle is likely 1/3 compares to 6.3.4. So the performance issue is not related to number of processes.
That is to say, before the process# fix, the performance (80 processes in this code base) is 70% of 20 processes in 6.3.4 on the same hardware, if we reduce the process# to 20, the performance is event worse.
the performance issue is not related to the server change, because it can be reproduced in CLI mode.
and I saw plenty of logs from httpevent outputter:
2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'} 2019-07-24 02:13:18 eventgen ERROR Process-15 {'event': 'failure outputting event, does not contain _raw'}
it might be the root cause of why we have such critical throughput loss
@YifengMao I think it is a good point. Let's check the http outputter.
@arctan5x you mentioned in last sync up meeting that "Missing the logic where Eventgen Controller needs to know dropped servers", does it covered now?
@li-wu We don't have it still, but that feature only affects /index endpoint which is very low impact and no functional effect. So going to release this then add it as a backlog.
I saw a lots of coding format issue (spacing, unused imports, line length etc.) on
PyCharm
. Before we havelint
in CI job, I think we should probably pay attention to it in each PR.