Hello. :)
I'm runnning multiple versions of trankit in docker containers (each container is assigned a rtx3090 or a rtx2080Ti), with 3 python instances/workers per GPU/container.
I'm seeing performance throughput drop off beyond about 3 gpus on a dual 2697 v3 machine (dual 16 core processors, single thread passmark about 2000, multi 20k per cpu), and for a single gpu, performance is about 15% lower than on a 5950x machine (16 cores, single thread passmark about 3500).
I'm still doing some tests, but, seems like trankit likes fast cpu cores (seems like 4-5 per gpu) to run well?
Hello. :) I'm runnning multiple versions of trankit in docker containers (each container is assigned a rtx3090 or a rtx2080Ti), with 3 python instances/workers per GPU/container.
I'm seeing performance throughput drop off beyond about 3 gpus on a dual 2697 v3 machine (dual 16 core processors, single thread passmark about 2000, multi 20k per cpu), and for a single gpu, performance is about 15% lower than on a 5950x machine (16 cores, single thread passmark about 3500).
I'm still doing some tests, but, seems like trankit likes fast cpu cores (seems like 4-5 per gpu) to run well?