tattle-made / DAU

MCA Tipline for Deepfakes
GNU General Public License v3.0
6 stars 0 forks source link

Finalize feluda operator system requirement #29

Closed dennyabrain closed 5 months ago

dennyabrain commented 5 months ago

Overview

Acceptance Criteria

dennyabrain commented 5 months ago

@aatmanvaidya @duggalsu Here's a list of ec2 types offered by aws- https://aws.amazon.com/ec2/pricing/on-demand/ it lets you choose between memorry optimized, vs storage optimized vs compute optimized. it also lets you choose between core count and ram. review it and make a list for me of which instance types are worth evaluating.

dennyabrain commented 5 months ago

I had an issue setting up feluda on my machine.

requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 13) and urllib3==2.0.7 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested urllib3==2.0.7
    botocore 1.34.19 depends on urllib3<1.27 and >=1.25.4; python_version < "3.10"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

flagging this as something that might trump us.

Python 3.9.18 on Ubuntu 20.04.2 LTS. Using the c35079

duggalsu commented 5 months ago

This shouldn't have happened. But it is happening because we have not upgraded boto3 (and other packages) to the latest in feluda core. So there are dependency mismatches when generating requirements.txt for operators and core, and i've been manually downgrading botocore across requirements files and regenerating to maintain compatibility.

dennyabrain commented 5 months ago

A note on trying to break the operator to its limit. I removed the check around file size and tried processing a 1 hour long video that was 800 mb in size. The python process exits after 10 or so seconds. My rudimentary observation of htop tells me that the all the 12 cores don't run at ful capacity but the memory usage increases with time and eventually the process runs out of memory. So I think in the short run, keeping the file size limit might be useful to prevent large files from causing a crash.

dennyabrain commented 5 months ago

A neat thing. I eventually got the operator to run on this 1 hour video without running into out of memory error!!! Caveat : i just got the operator to run. I cant say anything about the search result implications of it

I figured that the cause for out of memory error was

https://github.com/tattle-made/feluda/blob/c350792db1ffcb9c53149ea42f68adf5f4b0cd07/src/api/core/operators/vid_vec_rep_resnet.py#L157C9-L168C26

        def extract_frames(self, v):
            # print("extracting frames")
            images = []
            for i in range(self.n_frames):
                success, image = v.read()
                if image is None:
                    continue
                else:
                    if i % self.sampling_rate == 0:
                        images.append(Image.fromarray(image))
            # print("extracted frames")
            return images

Every frame of the video is added to the images list. Hence we get the out of memory error.

I tried a rudimentary trick to convert this into a generator and return 100 frames at a time :

def extract_frames(self, v):
            # print("extracting frames")
            for i in range(0, self.n_frames, 100):
                images = []
                for i in range(100):
                    success, image = v.read()
                    if image is None:
                        continue
                    else:
                        if i % self.sampling_rate == 0:
                            images.append(Image.fromarray(image))
                yield images
            # print("extracted frames")

and the corresponding change in the analyze function to consume this generator :

def analyze(self, video):
            # print("analyzing video")
            for frames in self.extract_frames(video):
                feature_matrix = self.extract_features(frames)
                self.keyframe_indices = self.find_keyframes(feature_matrix)
                self.keyframe_features = feature_matrix[:, self.keyframe_indices]
            # print("analysed video")

Result : function took 625.5453 seconds to run.

dennyabrain commented 5 months ago

Current status is that we know RAM usage depends on the length of the video file. Given my proof of concept above, it looks like we can process long files by chunking the processing of frames and get a decent upper limit for RAM consumption. Given that for the next milestone, our priority is to be able to support processing of video files that are a few minutes long and that right now we dont anyways want to support processing really long files, we can assume the file length to be limited and hence the RAM usage also to be limited.

In today's call Aurora mentioned that looking at the code we use for inference, trying out a GPU wont be worth it also. So we are parking all GPU related tests for later as well.

This leaves us with compute optimized EC2s as the category of instances to try within. One thing we can also check for is that since our cores and memory isnt used at full capacity, this means that kubernetes can successfully schedule multiple pods on the same node. Getting us more value for money for every new node we provision.

aatmanvaidya commented 5 months ago

documentation on memory and cpu profiling is here - https://github.com/tattle-made/feluda/wiki/Optimization

dennyabrain commented 5 months ago

I've selected some EC2s for the first round of test. Included the hourly and daily cost because we might scale the nodes up and down and might not need a large node to stay on throughout.

EC2 type vCPU Memory hourly USD hourly INR daily INR monthly INR
c7g.large 2 4 0.0491 4.078246 97.877904 2936.33712
c7g.xlarge 4 8 0.1445 12.00217 288.05208 8641.5624
c7g.2xlarge 8 16 0.289 24.00434 576.10416 17283.1248
c7g.4xlarge 16 32 0.3926 32.609356 782.624544 23478.73632
c7g.16xlarge 64 128 1.5706 130.454036 3130.896864 93926.90592
r7g.large 2 16 0.0751 6.237806 149.707344 4491.22032
r7g.xlarge 4 32 0.1502 12.475612 299.414688 8982.44064
r7g.4xlarge 16 128 0.704 58.47424 1403.38176 42101.4528
dennyabrain commented 5 months ago

@aatmanvaidya @duggalsu when we deploy the container to kubernetes, we can specify the command it should run when launched. for the sake of this test I was thinking we can create scripts inside /benchmark folder. The caveat to our script is that it should not exit but stay alive (think infinite loop), so that kubernetes does not kill the container. The other reason i need the container to be running is that after the test is run I'd like to ssh into it to get the output files.

So i was thinking that our bench mark script could be something like this script1.sh

python test.py
tail -f /dev/null

script2.sh

python3 -m memray run -o vid_vec_rep_resnet.bin vid_vec_rep_resnet.py
tail -f /dev/null

So lets create appropriate scripts like these. Then we can deploy the container and change the command that needs to be executed on container start and run these tests in the cluster.

dennyabrain commented 5 months ago

Sharing the Kubernetes deployment file for reference. We'll simply change the replica count and command to run different containers.

apiVersion: apps/v1
kind: Deployment

metadata:
  name: feluda-operator-vidvec
  labels:
    app.kubernetes.io/name: feluda-operator-vidvec

spec:
  replicas: 1
  resources:
    requests:
        cpu: "1000m"
        memory: "4000Mi"
    limits:
        cpu: "4000m"
        memory: "8000Mi"
  selector:
    matchLabels:
      app.kubernetes.io/name: feluda-operator-vidvec
  template:
    metadata:
      labels:
        app.kubernetes.io/name: feluda-operator-vidvec
    spec:
      containers:
        - name: feluda-operator-vidvec
          image: tattletech/feluda-operator-vid-vec:f6bb56c
          imagePullPolicy: Always
          command: ["python"]
          args: ["test.py"]
dennyabrain commented 5 months ago

We'll rely on the github actions to push new docker images of our operators to dockerhub. Reference implementation https://github.com/tattle-made/feluda/blob/9f425587f93e02005554b496c059144c90e19f74/.github/workflows/prod-deploy.yml#L44-L50

dennyabrain commented 5 months ago

Workflow :

  1. Denny will provision the EC2 instance we want to test on.
  2. Aatman, Aurora make changes to the operator and push to github, which triggers(manually or automatically) a workflow that builds a docker image customized for the appropriate operator and pushes to the dockerhub. Each such push will tag the image on dockerhub with the commit id. so new versions of the image will be uniquely identifiable.
  3. Denny will copy the commit id and place it in the kubernetes deployment manifest file above. He'll also change the replica count if apt to force more than one container to run on each node. When deployed this should run our test and leave the container idle.
  4. Denny will SSH into the containers and download the output files
  5. Aatman and Aurora will analyze the result.

We are charged hourly on the EC2 instance usage so once spun up, we have no reason to shut down the EC2 immediately. So we can run a few tests in one go within that hour and learn all we need to before shutting it down.

We then repeate these steps for all EC2 instances we care to test this on.

duggalsu commented 5 months ago

image-operator 1.3GB video-operator 1.66GB

dennyabrain commented 5 months ago

A trivial feedback on using --no-cache-dir as an argument to pip install. Did a quick try and it bought the video-operator size to 1.35 GB.

I also notice that the largest thing in the docker image is the torch library. around 800 mb. doesnt seem like much we can do to reduce it. what is your opinion?

duggalsu commented 5 months ago

Further optimized dockerfiles

- WITH - pip no cache optimization, and removing torch, torchvision, vim, curl from feluda core
feluda-indexer  470MB
feluda-reporter 470MB
feluda-api      470MB
image-operator  1.08GB
video-operator  1.35GB
dennyabrain commented 5 months ago

Scenario Planning :

Goal : Offer an acceptable response time (lets assume < 5 minutes for now) for every possible scenario.

Scenarios :

  1. Consistent Traffic through the day : a. low (1000 messages a day) b. medium (50,000 messages a day) c. high (1,00,000 messages a day)
  2. Traffic Surge on a known day (pre provisioned infrastructure) a. low (1000 messages in a minute) b. medium (50,000 messages in a minute) c. high (1,00,000 messages in a minute)
  3. Unexpected Traffic surge on a day (provisioning will happen post facto) a. low (1000 messages in a minute) b. medium (50,000 messages in a minute) c. high (1,00,000 messages in a minute)
dennyabrain commented 5 months ago

Question to focus on : Whey do we get slow performance on multicore intel machines (c7i* family) when we increase the number of pod replicas (container). Especially when core > 4

dennyabrain commented 5 months ago

Tasks

dennyabrain commented 5 months ago

Image