Closed marioferh closed 2 years ago
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: marioferh
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/lgtm again extremely terse commit message (for example: reference the PR updating the e2e data; declaring that thanks to that we can now re-enable the tests), but good enough for now.
Did you rebase on top of https://github.com/openshift-kni/debug-tools/pull/59 ? I see one test failing
Did you rebase on top of #59 ? I see one test failing
Yep, rebased it, but its failing, checking it
New changes are detected. LGTM label has been removed.
there is some info in the machineinfo arch dependent, no all date comes from /proc /sys
there is some info in the machineinfo arch dependent, no all date comes from /proc /sys
which infos?
I can see many differences between run in the same machine as generated json and in the ci:
e.g.:
- "memory_by_type": map[string]interface{}{},
+ "memory_by_type": map[string]interface{}{
+ "Unbuffered-DDR4": map[string]interface{}{"capacity": float64(1.03079215104e+11), "dimm_count": float64(12)},
+ },
I can see many differences between run in the same machine as generated json and in the ci:
e.g.:
- "memory_by_type": map[string]interface{}{}, + "memory_by_type": map[string]interface{}{ + "Unbuffered-DDR4": map[string]interface{}{"capacity": float64(1.03079215104e+11), "dimm_count": float64(12)}, + },
this data is collected from sysinfo: https://github.com/google/cadvisor/blob/master/machine/machine.go#L157 and https://github.com/google/cadvisor/blob/master/machine/info.go#L77 . Let's make sure we collect this data ourselves (in gather-sysinfo
, though).
or, if the data is not relevant for us, let's make sure we never collect it: https://github.com/openshift-kni/debug-tools/blob/main/pkg/machineinformer/informer.go#L49
or, if the data is not relevant for us, let's make sure we never collect it: https://github.com/openshift-kni/debug-tools/blob/main/pkg/machineinformer/informer.go#L49
I have to investigate which data is relevant
/test ci-e2e-test
@fromanirh there are three differences in machineinfo.json "vendor_id": "AuthenticAMD", "num_cores": 104, this can be hardcoded
And the topology that is different in local cluster and ci cluster.
I don't know if can be hard-coded or not
typo: should read
machineinfo
sorry, where?
typo: should read
machineinfo
sorry, where?
in the commit message proper
@fromanirh there are three differences in machineinfo.json "vendor_id": "AuthenticAMD", "num_cores": 104, this can be hardcoded
And the topology that is different in local cluster and ci cluster.
I don't know if can be hard-coded or not
the number of cores and the architecture should totally be read from procfs and sysfs so they should work there. We need to dig deeper to avoid hiding bugs.
Additionally, it would be nice to have docs to explain how to collect the machineinfo for the e2e tests, or next time we need to update we will end up in this stage again
ok, let me check why is taking this data from system and no from local data
before you submit the final version, note you still have the macineinfo
typo in the commit message
before you submit the final version, note you still have the
macineinfo
typo in the commit message
already fixed
@marioferh: The following test failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/ci | 892fda62310dd3511909020b24d6060b315efd3e | link | true | /test ci |
Full PR test history. Your PR dashboard.
@fromanirh fixed it. The initial issue was: tests launched in local extract info and then delete it, when I generate the output json with local command it was taking the local sys, not the uncompressed one from the tar :S The second was, CPUVendorID was not cleaned with the --clean-procfs-info. I've already added that cleanup. The command has -P ./test/data/dell_2_numa/proc then I though that proc data comes from local folder not from system proc, that confused me.
ok, this is more complex than we initially thought. We can pause this work and resume later when there is more bandwidth and we can sort out and tackle the dependencies.
Re-enable machineinfo test with new updated data of #59 Add clean of CPUVendorID to cleanProcfs to avoid system dependence Remove -P from machineinfo command. Not used procfs in machineinfotest and can lead to confusion