Open caldeirav opened 2 years ago
Mikhail working on this
still need to implement GPU dashboard but won't use Kafka event-driven approach - still in progress
ETA 30-Nov
@redmikhail to implement the week of 12-Dec
in progress - resolving some issues where CL1/CL2 in different states to keep in sync; will need PRs approved (Ryan here to 23rd; Eric here to 22nd)
installed 2 dashboards on CL1, but fixes still required and add to operate first to deploy to CL2. still need to update NVIDI
There is a need to have better monitoring and management for GPU usage - for the two sets of GPU we are planning to use. To look into building a pipeline from the AWS monitoring data, and integrate into Kafka running on the cluster.
This could be a good technical POC for event-driven data ingestion pattern we have discussed with LSEG, and something a new team member with event-driven architecture knowledge could work on fairly easily.