iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Understand why SX Pods in the AF are so much worse than in River #52

Closed gordonwatts closed 6 months ago

gordonwatts commented 6 months ago

Saw a big difference in the efficiency of the pods on river as compared to AF. AF was running at about 8% CPU, and River was seeing 45% CPU. Could that be networking? Not known.

This was seen in #49 .

robrwg commented 6 months ago

Could be. Gordon you might be getting taxed by running on a system where others are doing real work. We also ask a portion of the workers to serve as Ceph pool nodes. Have you tried restricting Sx transforms to use only 'pure' c-nodes?

gordonwatts commented 6 months ago

Thanks for this idea - I have not - though I don't know how to (I assume work with either @ivukotic or someone else).

gordonwatts commented 6 months ago

We experimented with this further today, with @ivukotic allocating nodes on AF and River. They both seemed to be running quite well. The only change i know of was a shift to the nameserver infrastructure - which was part of what was causing problems last time. The good news is - performance wise, we could not tell the difference between the nodes in today's tests (see #68)