Open olivetom opened 1 month ago
@oandreeva-nv Hi Olga, should I rephrase this issue to get help?
Hi @olivetom , thanks for your question. There are certain limitations for Jetson: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/jetson.md
The Python backend does not support GPU Tensors and Async BLS.
CUDA IPC (shared memory) is not supported. System shared memory however is supported.
I believe you are hitting one of those. Could you please share your model.py
for easier debugging?
Could you please also provide some steps on how you are building triton + circling back:
do I have to build python backend with GPU support?
yes, I would recommend trying this as well. We also publish NGC iGPU containers. Is this something you would potentially consider?
Hi @oandreeva-nv,
Here is my model.py
that runs perfect on x86_64 but fails on jetson orin. Below I also shared my ensemble steps.
from typing import List, Tuple
import numpy as np
import numpy.typing as npt
import torch
import triton_python_backend_utils as pb_utils
from torch.nn.functional import pad
class TritonPythonModel:
def initialize(self, args) -> None:
self.logger = pb_utils.Logger
self.cuda = torch.cuda.is_available()
self.logger.log_info(f": initialize: CUDA available: {self.cuda}")
self.logger.log_info(f": initialize: multiensemble model args: {args}")
self.logger.log_info(f": initialize: Config: {self.config}")
if not pb_utils.is_model_ready(
model_name="multiemotion", model_version="1"
): # this loads model latest version as well
# Load the model from the model repository
pb_utils.load_model(model_name="multiemotion")
def execute(self, requests): # type: ignore
responses: List[pb_utils.InferenceResponse] = []
for request in requests:
images = pb_utils.get_input_tensor_by_name(request, "images").as_numpy()
face_bboxes = pb_utils.get_input_tensor_by_name(
request, "face_bboxes"
).as_numpy()
face_scores = pb_utils.get_input_tensor_by_name(
request, "face_scores"
).as_numpy()
face_keypoints = pb_utils.get_input_tensor_by_name(
request, "face_keypoints"
).as_numpy()
face_classifications = pb_utils.get_input_tensor_by_name(
request, "face_classifications"
).as_numpy()
request_id: str = request.request_id()
self.logger.log_info(
f": execute: 'images' input shape={images.shape}"
)
self.logger.log_info(
f": execute: 'face_bboxes' input shape={face_bboxes.shape}"
)
self.logger.log_info(
f": execute: 'face_scores' input shape={face_scores.shape}"
)
self.logger.log_info(
f": execute: 'face_keypoints' input shape={face_keypoints.shape}"
)
self.logger.log_info(
f": execute: 'face_classifications' input shape={face_classifications.shape}"
)
self.logger.log_info(f": execute: REQUEST_ID={request_id}")
self.batch_size: int = images.shape[0]
self.logger.log_info(f": execute: Batch size={self.batch_size}")
image_tensor: Torch.Tensor = (
torch.tensor(images, dtype=torch.float32)
if self.cuda
else torch.tensor(images, dtype=torch.float32).cpu()
)
bounding_boxes: Torch.Tensor = (
torch.tensor(face_bboxes, dtype=torch.float32)
if self.cuda
else torch.tensor(face_bboxes, dtype=torch.float32).cpu()
)
scores: Torch.Tensor = (
torch.tensor(face_scores, dtype=torch.float32)
if self.cuda
else torch.tensor(face_scores, dtype=torch.float32).cpu()
)
keypoints: Torch.Tensor = (
torch.tensor(face_keypoints, dtype=torch.float32)
if self.cuda
else torch.tensor(face_keypoints, dtype=torch.float32).cpu()
)
classifications: Torch.Tensor = (
torch.tensor(face_classifications, dtype=torch.int32)
if self.cuda
else torch.tensor(face_classifications, dtype=torch.int32).cpu()
)
(
filtered_face_bboxes,
filtered_face_scores,
filtered_face_keypoints,
filtered_face_classifications,
filtered_face_auc,
filtered_face_bec,
filtered_face_valence,
filtered_face_arousal,
) = self.process_faces(
image_tensor=image_tensor,
bounding_boxes=bounding_boxes,
scores=scores,
score_threshold=self.config.multitask_face_score_threshold,
min_size=self.config.multitask_min_face_size,
target_size=(self.config.emotions_width, self.config.emotions_height),
save_dir=None,
request_id=request_id,
keypoints=keypoints,
classifications=classifications,
)
self.logger.log_info(
f": execute: filtered_face_bboxes shape={filtered_face_bboxes.shape}"
)
self.logger.log_info(
f": execute: filtered_face_scores shape={filtered_face_scores.shape}"
)
self.logger.log_info(
f": execute: filtered_face_keypoints shape={filtered_face_keypoints.shape}"
)
self.logger.log_info(
f": execute: filtered_face_classifications shape={filtered_face_classifications.shape}"
)
self.logger.log_info(
f": execute: filtered_face_aucs shape={filtered_face_auc.shape}"
)
self.logger.log_info(
f": execute: filtered_face_becs shape={filtered_face_bec.shape}"
)
self.logger.log_info(
f": execute: filtered_face_valences shape={filtered_face_valence.shape}"
)
self.logger.log_info(
f": execute: filtered_face_arousals shape={filtered_face_arousal.shape}"
)
filtered_face_bboxes = pb_utils.Tensor(
"filtered_face_bboxes", filtered_face_bboxes.numpy()
)
filtered_face_scores = pb_utils.Tensor(
"filtered_face_scores", filtered_face_scores.numpy()
)
filtered_face_keypoints = pb_utils.Tensor(
"filtered_face_keypoints", filtered_face_keypoints.numpy()
)
filtered_face_classifications = pb_utils.Tensor(
"filtered_face_classifications", filtered_face_classifications.numpy()
)
filtered_face_aucs = pb_utils.Tensor(
"filtered_face_aucs", filtered_face_auc.numpy()
)
filtered_face_becs = pb_utils.Tensor(
"filtered_face_becs", filtered_face_bec.numpy()
)
filtered_face_valences = pb_utils.Tensor(
"filtered_face_valences", filtered_face_valence.numpy()
)
filtered_face_arousals = pb_utils.Tensor(
"filtered_face_arousals", filtered_face_arousal.numpy()
)
multiemotion_response = pb_utils.InferenceResponse(
output_tensors=[
filtered_face_bboxes,
filtered_face_scores,
filtered_face_keypoints,
filtered_face_classifications,
filtered_face_aucs,
filtered_face_becs,
filtered_face_valences,
filtered_face_arousals,
]
)
responses.append(multiemotion_response)
self.logger.log_info(f": Number of responses: {len(responses)}.")
return responses
def process_faces(
self,
image_tensor: torch.Tensor, # [N, C, H, W] FP32
bounding_boxes: torch.Tensor, # [N, num_boxes, 4] in (x1, y1, x2, y2) format
scores: torch.Tensor, # [N, num_boxes] FP32
score_threshold: float = 0.95,
min_size: int = 75,
target_size: Tuple[int, int] = (100, 100),
save_dir: str = "/tmp/saved_faces", # Directory to save the images
request_id: str = "unique_id",
keypoints: torch.Tensor = None,
classifications: torch.Tensor = None,
) -> Tuple[torch.Tensor, ...]:
N, C, H, W = image_tensor.shape
all_filtered_bboxes = [[] for _ in range(N)]
all_filtered_scores = [[] for _ in range(N)]
all_filtered_keypoints = [[] for _ in range(N)]
all_filtered_classifications = [[] for _ in range(N)]
all_filtered_auc = [[] for _ in range(N)]
all_filtered_bec = [[] for _ in range(N)]
all_filtered_valence = [[] for _ in range(N)]
all_filtered_arousal = [[] for _ in range(N)]
all_cropped_faces = [[] for _ in range(N)]
self.logger.log_info(
f": process_faces: image_tensor shape={image_tensor.shape}"
)
self.logger.log_info(
f": process_faces: bounding_bboxes shape={bounding_boxes.shape}"
)
self.logger.log_info(f": process_faces: scores shape={scores.shape}")
self.logger.log_info(
f": process_faces: keypoints shape={keypoints.shape}"
)
self.logger.log_info(
f": process_faces: classifications shape={classifications.shape}"
)
total_cropped_faces = 0
for i in range(N):
image = image_tensor[i] # [C, H, W]
max_faces = bounding_boxes.shape[1]
boxes = bounding_boxes[i] # [num_boxes, 4]
scores_i = scores[i].squeeze(-1) # [num_boxes]
keypoints_i = keypoints[i] # [num_boxes, 5, 3]
classifications_i = classifications[i] # [num_boxes]
# 1. Filter out bounding boxes based only on score and size
score_mask = scores_i >= score_threshold
widths = boxes[:, 2] - boxes[:, 0]
heights = boxes[:, 3] - boxes[:, 1]
size_mask = (widths >= min_size) & (heights >= min_size)
final_mask = score_mask & size_mask
boxes = boxes[final_mask]
scores_i = scores_i[final_mask].unsqueeze(-1) # [num_boxes, 1]
keypoints_i = keypoints_i[final_mask]
classifications_i = classifications_i[final_mask]
# 2. Clip bounding boxes to image boundaries
boxes[:, [0, 2]] = boxes[:, [0, 2]].clamp(0, W)
boxes[:, [1, 3]] = boxes[:, [1, 3]].clamp(0, H)
# 3. Crop the faces from the images based on the bounding boxes
for box in boxes:
x1, y1, x2, y2 = box.int()
face = image[:, y1:y2, x1:x2] # Crop the face
# 4. Resize the cropped faces to a target size with padding
face_resized = self.resize_with_padding(face, target_size)
all_cropped_faces[i].append(face_resized)
total_cropped_faces += 1
# Stack the processed faces for the current image
if all_cropped_faces[i]:
self.logger.log_info(
f": process_faces: cropped_faces length={len(all_cropped_faces[i])}"
)
faces_tensor: Torch.Tensor = (
torch.stack(all_cropped_faces[i])
if self.cuda
else torch.stack(all_cropped_faces[i]).cpu()
)
self.logger.log_info(
f": process_faces: faces_tensor shape={faces_tensor.shape}"
)
auc, bec, valence, arousal = self.get_emotion_tensors(
faces_tensor=faces_tensor,
request_id=request_id,
max_batch_size=self.config.emotions_max_batch_size,
)
# Append the results to the lists
all_filtered_bboxes[i].extend(boxes)
all_filtered_scores[i].extend(scores_i)
all_filtered_keypoints[i].extend(keypoints_i)
all_filtered_classifications[i].extend(classifications_i)
all_filtered_auc[i].extend(auc)
all_filtered_bec[i].extend(bec)
all_filtered_valence[i].extend(valence)
all_filtered_arousal[i].extend(arousal)
current_faces = len(all_cropped_faces[i])
current_bboxes = len(all_filtered_bboxes[i])
assert (
current_faces == current_bboxes
), f"current_faces={current_faces} != current_bboxes={current_bboxes}"
self.logger.log_info(
f": process_faces: current_faces={current_faces}"
)
padding = max_faces - current_faces
do_padding = padding < max_faces and padding > 0
self.logger.log_info(f": process_faces: padding={padding}")
self.logger.log_info(f": process_faces: do_padding={do_padding}")
if do_padding:
self.logger.log_info(
f": process_faces: all_filtered_bboxes[i] len= {len(all_filtered_bboxes[i])}"
)
self.logger.log_info(
f": process_faces: all_filtered_bboxes[i] = {all_filtered_bboxes[i]}"
)
all_filtered_bboxes[i] = torch.stack(
[_ for _ in all_filtered_bboxes[i]]
)
self.logger.log_info(
f": process_faces: tensor all_filtered_bboxes[i] size={all_filtered_bboxes[i].size()}"
)
self.logger.log_info(
f": process_faces: all_filtered_scores[i] = {all_filtered_scores[i]}"
)
all_filtered_scores[i] = torch.stack(
[_ for _ in all_filtered_scores[i]]
)
self.logger.log_info(
f": process_faces: tensor all_filtered_scores[i] size={all_filtered_scores[i].size()}"
)
self.logger.log_info(
f": process_faces: all_filtered_keypoints[i] = {all_filtered_keypoints[i]}"
)
all_filtered_keypoints[i] = torch.stack(
[_ for _ in all_filtered_keypoints[i]]
)
self.logger.log_info(
f": process_faces: tensor all_filtered_keypoints[i] size={all_filtered_keypoints[i].size()}"
)
self.logger.log_info(
f": process_faces: all_filtered_classifications[i] = {all_filtered_classifications[i]}"
)
all_filtered_classifications[i] = torch.stack(
[_ for _ in all_filtered_classifications[i]]
)
self.logger.log_info(
f": process_faces: tensor all_filtered_classifications[i] size={all_filtered_classifications[i].size()}"
)
all_filtered_auc[i] = torch.tensor(np.stack(all_filtered_auc[i]))
all_filtered_bec[i] = torch.tensor(np.stack(all_filtered_bec[i]))
all_filtered_valence[i] = torch.tensor(
np.stack(all_filtered_valence[i])
)
all_filtered_arousal[i] = torch.tensor(
np.stack(all_filtered_arousal[i])
)
all_filtered_bboxes[i] = pad(
all_filtered_bboxes[i], (0, 0, 0, padding), value=0.0
)
all_filtered_scores[i] = pad(
all_filtered_scores[i], (0, 0, 0, padding), value=0.0
)
all_filtered_keypoints[i] = pad(
all_filtered_keypoints[i], (0, 0, 0, 0, 0, padding), value=0.0
)
all_filtered_classifications[i] = pad(
all_filtered_classifications[i], (0, padding), value=0
)
all_filtered_auc[i] = pad(
all_filtered_auc[i], (0, 0, 0, padding), value=0.0
)
all_filtered_bec[i] = pad(
all_filtered_bec[i], (0, 0, 0, padding), value=0.0
)
all_filtered_valence[i] = pad(
all_filtered_valence[i], (0, 0, 0, padding), value=0.0
)
all_filtered_arousal[i] = pad(
all_filtered_arousal[i], (0, 0, 0, padding), value=0.0
)
else:
all_filtered_bboxes[i] = torch.zeros(
(max_faces, 4), dtype=torch.float32
)
all_filtered_scores[i] = torch.zeros(
(max_faces, 1), dtype=torch.float32
)
all_filtered_keypoints[i] = torch.zeros(
(max_faces, 5, 3), dtype=torch.float32
)
all_filtered_classifications[i] = torch.zeros(
(max_faces), dtype=torch.int32
)
all_filtered_auc[i] = torch.zeros((max_faces, 8), dtype=torch.float32)
all_filtered_bec[i] = torch.zeros((max_faces, 7), dtype=torch.float32)
all_filtered_valence[i] = torch.zeros(
(max_faces, 1), dtype=torch.float32
)
all_filtered_arousal[i] = torch.zeros(
(max_faces, 1), dtype=torch.float32
)
[
self.logger.log_info(
f": process_faces: all_filtered_bboxes[{i}] shape={all_filtered_bboxes[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_scores[{i}] shape={all_filtered_scores[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_keypoints[{i}] shape={all_filtered_keypoints[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_classifications[{i}] shape={all_filtered_classifications[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_auc[{i}] shape={all_filtered_auc[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_bec[{i}] shape={all_filtered_bec[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_valence[{i}] shape={all_filtered_valence[i].shape}"
)
for i in range(N)
]
[
self.logger.log_info(
f": process_faces: all_filtered_arousal[{i}] shape={all_filtered_arousal[i].shape}"
)
for i in range(N)
]
padded_bboxes = torch.stack(all_filtered_bboxes)
padded_scores = torch.stack(all_filtered_scores)
padded_keypoints = torch.stack(all_filtered_keypoints)
padded_classifications = torch.stack(all_filtered_classifications)
padded_auc = torch.stack(all_filtered_auc)
padded_bec = torch.stack(all_filtered_bec)
padded_valence = torch.stack(all_filtered_valence)
padded_arousal = torch.stack(all_filtered_arousal)
# all_filtered* is a list of lists where each sublist contains the bboxes, etc for a single image in the batch
# all_filtered_bboxes = [[bboxes11, bboxes12, ...], [bboxes21, bboxes22, bboxes23, ...], ...]
# all_filtered_scores = [[scores11, scores12, ...], [scores21, scores22, scores23, ...], ...]
# an so on for all 6 other lists (keypoints, classifications, auc, bec, valence, arousal)
# we should pad each sublist to the same length to create 8 tensors with shapes
# for bboxes: (N, 128, 4)
# for scores: (N, 128, 1)
# for keypoints: (N, 128, 5, 3)
# for classifications: (N, 128, 1) of int32
# for auc: (N, 128, 8)
# for bec: (N, 128, 7)
# for valence: (N, 128, 1)
# for arousal: (N, 128, 1)
# where N is the batch size and 128 is the maximum number of faces in each image of the batch
# Ensure the padded shapes match the expected dimensions
assert padded_bboxes.shape == (
N,
max_faces,
4,
), f"padded_bboxes.shape={padded_bboxes.shape} != (N, 128, 4)"
assert padded_scores.shape == (
N,
max_faces,
1,
), f"padded_scores.shape={padded_scores.shape} != (N, 128, 1)"
assert padded_keypoints.shape == (
N,
max_faces,
5,
3,
), f"padded_keypoints.shape={padded_keypoints.shape} != (N, 128, 5, 3)"
assert padded_classifications.shape == (
N,
max_faces,
), f"padded_classifications.shape={padded_classifications.shape} != (N, 128, 1)"
assert padded_auc.shape == (
N,
max_faces,
8,
), f"padded_auc.shape={padded_auc.shape} != (N, 128, 8)"
assert padded_bec.shape == (
N,
max_faces,
7,
), f"padded_bec.shape={padded_bec.shape} != (N, 128, 7)"
assert padded_valence.shape == (
N,
max_faces,
1,
), f"padded_valence.shape={padded_valence.shape} != (N, 128, 1)"
assert padded_arousal.shape == (
N,
max_faces,
1,
), f"padded_arousal.shape={padded_arousal.shape} != (N, 128, 1)"
return (
padded_bboxes,
padded_scores,
padded_keypoints,
padded_classifications,
padded_auc,
padded_bec,
padded_valence,
padded_arousal,
)
def get_emotion_tensors(
self,
faces_tensor: torch.Tensor,
max_batch_size: int = 12,
request_id: str = "unique_id",
) -> torch.Tensor:
num_faces = faces_tensor.shape[0]
self.logger.log_info(
f": get_emotion_tensors: faces_tensor shape={faces_tensor.shape}"
)
self.logger.log_info(
f": get_emotion_tensors: faces_tensor device={faces_tensor.device}"
)
auc_output: np.ndarray[np.float32] = np.zeros((num_faces, 8))
bec_output: npt.NDArray[np.float32] = np.zeros((num_faces, 7))
valence_output: npt.NDArray[np.float32] = np.zeros((num_faces, 1))
arousal_output: npt.NDArray[np.float32] = np.zeros((num_faces, 1))
# Process the faces in batches, honoring the max_batch_size
for i, start_idx in enumerate(range(0, num_faces, max_batch_size)):
end_idx = min(start_idx + max_batch_size, num_faces)
inference_request = pb_utils.InferenceRequest(
request_id=request_id + str(i),
model_name="multiemotion",
requested_output_names=[
"action_units_classifications",
"basic_emotions_classifications",
"valence",
"arousal",
],
inputs=[
pb_utils.Tensor("images", faces_tensor[start_idx:end_idx].numpy())
],
preferred_memory=pb_utils.PreferredMemory(
pb_utils.TRITONSERVER_MEMORY_CPU, 0
),
)
inference_response: pb_utils.InferenceResponse
inference_response = inference_request.exec()
# Check if the inference response has an error
if inference_response.has_error():
raise pb_utils.TritonModelException(
inference_response.error().message()
)
# accumulate results for all requested outputs
auc_output[start_idx:end_idx] = pb_utils.get_output_tensor_by_name(
inference_response, "action_units_classifications"
).as_numpy()
bec_output[start_idx:end_idx] = pb_utils.get_output_tensor_by_name(
inference_response, "basic_emotions_classifications"
).as_numpy()
valence_output[start_idx:end_idx] = pb_utils.get_output_tensor_by_name(
inference_response, "valence"
).as_numpy()
arousal_output[start_idx:end_idx] = pb_utils.get_output_tensor_by_name(
inference_response, "arousal"
).as_numpy()
self.logger.log_info(
f": get_emotion_tensors: AUC output shape={auc_output.shape}"
)
self.logger.log_info(
f": get_emotion_tensors: BEC output shape={bec_output.shape}"
)
self.logger.log_info(
f": get_emotion_tensors: Valence output shape={valence_output.shape}"
)
self.logger.log_info(
f": get_emotion_tensors: Arousal output shape={arousal_output.shape}"
)
return auc_output, bec_output, valence_output, arousal_output
And here my ensemble steps:
name: "multiensemble"
platform: "ensemble"
max_batch_size: 12
parameters [
{
key: "FORCE_CPU_ONLY_INPUT_TENSORS"
value: {string_value: "no"}
}
]
# multiensemble input = multitask input
input [
{
name: "images"
data_type: TYPE_FP32
dims: [3, 405, 720]
}
]
output [
{
name: "filtered_face_bboxes",
data_type: TYPE_FP32,
dims: [128, 4],
},
{
name: "filtered_face_scores",
data_type: TYPE_FP32,
dims: [128, 1],
},
{
name: "filtered_face_keypoints",
data_type: TYPE_FP32,
dims: [128, 5, 3],
},
{
name: "filtered_face_classifications",
data_type: TYPE_INT32,
dims: [128],
},
{
name: "filtered_face_aucs",
data_type: TYPE_FP32,
dims: [128, 8],
},
{
name: "filtered_face_becs",
data_type: TYPE_FP32,
dims: [128, 7],
},
{
name: "filtered_face_valences",
data_type: TYPE_FP32,
dims: [128, 1],
},
{
name: "filtered_face_arousals",
data_type: TYPE_FP32,
dims: [128, 1],
},
{
name: "body_bboxes",
data_type: TYPE_FP32,
dims: [128, 4],
},
{
name: "body_joints_keypoints",
data_type: TYPE_FP32,
dims: [128, 12, 3],
},
{
name: "body_scores",
data_type: TYPE_FP32,
dims: [128, 1],
},
{
name: "body_action_scores",
data_type: TYPE_FP32,
dims: [128, 5],
},
{
name: "body_fall_scores",
data_type: TYPE_FP32,
dims: [128, 2],
},
{
name: "body_classifications",
data_type: TYPE_INT32,
dims: [128],
},
{
name: "body_type_scores",
data_type: TYPE_FP32,
dims: [128, 3],
},
{
name: "furniture_bboxes",
data_type: TYPE_FP32,
dims: [128, 4],
},
{
name: "furniture_keypoints",
data_type: TYPE_FP32,
dims: [128, 8, 3],
},
{
name: "furniture_scores",
data_type: TYPE_FP32,
dims: [128, 2],
},
{
name: "furniture_classifications",
data_type: TYPE_INT32,
dims: [128],
}
]
ensemble_scheduling {
step [
# multitask
{
model_name: "multitask"
model_version: -1
input_map {
key: "images"
value: "images"
}
output_map {
key: "face_bboxes"
value: "face_bboxes"
}
output_map {
key: "face_scores"
value: "face_scores"
}
output_map {
key: "face_keypoints"
value: "face_keypoints"
}
output_map {
key: "face_classifications"
value: "face_classifications"
}
output_map {
key: "body_bboxes"
value: "body_bboxes"
}
output_map {
key: "body_joints_keypoints"
value: "body_joints_keypoints"
}
output_map {
key: "body_scores"
value: "body_scores"
}
output_map {
key: "body_action_scores"
value: "body_action_scores"
}
output_map {
key: "body_fall_scores"
value: "body_fall_scores"
}
output_map {
key: "body_classifications"
value: "body_classifications"
}
output_map {
key: "body_type_scores"
value: "body_type_scores"
},
output_map {
key: "furniture_bboxes"
value: "furniture_bboxes"
},
output_map {
key: "furniture_keypoints"
value: "furniture_keypoints"
}
output_map {
key: "furniture_scores"
value: "furniture_scores"
}
output_map {
key: "furniture_classifications"
value: "furniture_classifications"
}
},
# multiemotion_preprocess
{
model_name: "multiemotion_preprocess"
model_version: -1
input_map {
key: "images"
value: "images"
}
input_map {
key: "face_bboxes"
value: "face_bboxes"
}
input_map {
key: "face_scores"
value: "face_scores"
}
input_map {
key: "face_keypoints"
value: "face_keypoints"
}
input_map {
key: "face_classifications"
value: "face_classifications"
}
#_________ OUTPUTS ___________#
output_map {
key: "filtered_face_bboxes"
value: "filtered_face_bboxes"
}
output_map {
key: "filtered_face_scores"
value: "filtered_face_scores"
}
output_map {
key: "filtered_face_keypoints"
value: "filtered_face_keypoints"
}
output_map {
key: "filtered_face_classifications"
value: "filtered_face_classifications"
}
output_map {
key: "filtered_face_aucs"
value: "filtered_face_aucs"
}
output_map {
key: "filtered_face_becs"
value: "filtered_face_becs"
}
output_map {
key: "filtered_face_arousals"
value: "filtered_face_arousals"
}
output_map {
key: "filtered_face_valences"
value: "filtered_face_valences"
}
}
]
}
Could you please also provide some steps on how you are building triton + circling back:
do I have to build python backend with GPU support?
yes, I would recommend trying this as well. We also publish NGC iGPU containers. Is this something you would potentially consider?
I'm using vanilla deepstream triton NGC container nvcr.io/nvidia/deepstream:6.4-triton-multiarch
. So I guess this triton should support Jetson out-of-the-box...
I'm unaware about what deepstream
container support, to be honest. Could you try nvcr.io/nvidia/tritonserver:24.09-py3-igpu
?
@oandreeva-nv
what if I use python backend DLPack
as here: https://github.com/triton-inference-server/python_backend/blob/main/README.md#interoperability-and-gpu-support) instead of my current implementation:
images = pb_utils.get_input_tensor_by_name(request, "images").as_numpy()
I'm unaware about what
deepstream
container support, to be honest. Could you trynvcr.io/nvidia/tritonserver:24.09-py3-igpu
?
Will try and let you know the results. Thanks
Hi @oandreeva-nv,
Using nvcr.io/nvidia/tritonserver:24.09-py3-igpu
throws the same error : input_0: try to use CUDA copy while GPU is not supported
So, effectively triton python backend on jetson doesn't support GPU tensors (which is hard to swallow). But the following link says the contrary, unless this link is related to x64 architectures only: https://github.com/triton-inference-server/python_backend/blob/main/README.md#input-tensor-device-placement
Sadly, it's confusing... at least for me... :cry:
Hi @olivetom , apologies for confusion. We'll try our best to update documentation. For jetson we have a dedicated page: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/jetson.md
@oandreeva-nv
Hi @olivetom , apologies for confusion. We'll try our best to update documentation. For jetson we have a dedicated page: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/jetson.md
@oandreeva-nv this jetson docs refer to JetPack 5.0 from August 2022... Is there an updated version for JP 6.0?
I believe everything still stands for JP 6.0 as weel, but @nv-kmcgill53 may correct me if I'm wrong
My setup is:
input_0: try to use CUDA copy while GPU is not supported
This is somewhat similar to the following issues still OPEN:
4772 5578
So my question is: do I have to build python backend with GPU support? what's the easiest way to check if my current python backend on jetson supports ensemble of this kind?
Thanks in advance,