Open cshyjak opened 4 months ago
Testing out RayLLM and having issues where the model loads and runs fine initially but starts throwing errors after ~1hr of being running. This happens on multiple types of models. Example shown below using the model config from this repo.
RayService Configuration:
apiVersion: ray.io/v1alpha1 kind: RayService metadata: name: laivly-ml namespace: sidd-platform spec: serviceUnhealthySecondThreshold: 1200 # Config for the health check threshold for service. Default value is 60. deploymentUnhealthySecondThreshold: 1200 # Config for the health check threshold for deployments. Default value is 60. serveConfigV2: | applications: - name: router import_path: rayllm.backend:router_application route_prefix: /llm args: models: - ./models/continuous_batching/quantization/TheBloke--Llama-2-7B-chat-AWQ.yaml rayClusterConfig: headGroupSpec: rayStartParams: resources: '"{\"accelerator_type_cpu\": 2}"' dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: anyscale/ray-llm:0.5.0 resources: limits: cpu: 2 memory: 8Gi requests: cpu: 2 memory: 4Gi ports: - containerPort: 6379 name: gcs-server - containerPort: 8265 # Ray dashboard name: dashboard - containerPort: 10001 name: client - containerPort: 8000 name: serve nodeSelector: kubernetes.io/arch: amd64 workerGroupSpecs: - replicas: 1 minReplicas: 0 maxReplicas: 4 groupName: a10-gpu rayStartParams: resources: '"{\"accelerator_type_cpu\": 46, \"accelerator_type_a10\": 4}"' template: spec: containers: - name: llm image: anyscale/ray-llm:0.5.0 lifecycle: preStop: exec: command: ["/bin/sh","-c","ray stop"] resources: limits: cpu: "46" memory: "190G" nvidia.com/gpu: 4 requests: cpu: "2" memory: "4G" nvidia.com/gpu: 4 ports: - containerPort: 8000 name: serve nodeSelector: karpenter.k8s.aws/instance-family: g5
Testing out RayLLM and having issues where the model loads and runs fine initially but starts throwing errors after ~1hr of being running. This happens on multiple types of models. Example shown below using the model config from this repo.
RayService Configuration: