tattle-made / services-infrastructure

0 stars 1 forks source link

Add 1, 2 High Memory EC2s into our kubernetes cluster #8

Open dennyabrain opened 2 years ago

dennyabrain commented 2 years ago

We anticipate deploying some high memory (4GB) ML models for the ogbv project. These machines will be used to deploy models for inference tasks and not used to train the ML models. They will be exposed via a rest API. My preliminary research suggested that we pick an EC2 in the r4 or r5 family.

The unknowns right now are :

  1. Are these EC2s well suited for our task or should we choose something else.
  2. How to add these machine(s) to our k8 cluster. Ideally it would be nice to have them in the same cluster and not have to manage them independently.
  3. What is the estimated cost for deploying 1 such machine and >1. I think some redundancy if affordable should be built into this.

@rn-v and @mahalakshmijinadoss will be able to chime in about any specific questions about the model itself. My understanding is that they should have a dummy model ready in a week or two and then they'll get busy in developing the actual model. @tarunima tagging you here so you can keep an eye on any cost related discussion.

whymath commented 2 years ago

Tested adding a high-memory node to the Dev cluster, and it seems like it should be possible. Didn't deploy any containers on that node, since I wasn't sure which image to use, but we can test that out in the next step in any case.

As far as the pricing goes, the various options are attached here: Tattle_AWS_InstanceTypes_v0.1.xlsx

dennyabrain commented 2 years ago

I am paraphrasing requirements that had come from Arnav. CCing @mahalakshmijinadoss They think 4Gb RAM is what we need and 8 Gb is a good upper limit to account for any other rest API etc that we might want to run on it. We can start with 8Gb and downgrade to 4Gb if its underutilized.

@whymath is there a reason you did not consider r4.large, r4.xlarge and r4.2x large? Attaching a comparison sheet I had created for reference : costs

also good catch about the CPU Architecture. I was not considering it. @mahalakshmijinadoss do you have a sense if the libraries the model uses is compatible with ARM besides Intel?