open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
28.95k stars 9.36k forks source link

average inference time per frame/per video increases for htc object detection and segmenation model #9299

Open nikky4D opened 1 year ago

nikky4D commented 1 year ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

Pytorch installed via conda {'sys.platform': 'linux', 'Python': '3.9.13 (main, Oct 13 2022, 21:15:33) [GCC 11.2.0]', 'CUDA available': True, 'GPU 0,1,2,3': 'NVIDIA RTX A6000', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Cuda compilation tools, release 11.1, V11.1.74', 'GCC': 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0', 'PyTorch': '1.12.1', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201402\n - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.3\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.3.2 (built against CUDA 11.5)\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'TorchVision': '0.13.1', 'OpenCV': '4.6.0', 'MMCV': '1.6.2', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '11.3', 'MMDetection': '2.25.3+'}

Reproduces the problem - code sample

###########################################################
proc_api.py
###########################################################
from abc import abstractmethod, ABC

class BaseDetector(ABC):
    ## Initialize the model. 
    ## Load the model from configs, Set on evaluations
    def __init__(self): 
        pass

    ## Preprocessing a single image
    @abstractmethod
    def image_preprocess(self, img_name):
        pass

    ## Preprocessing a batch of images
    @abstractmethod
    def image_batch_preprocess(self, img_list):
        pass

    ## Detecting on a batch of images
    @abstractmethod
    def images_detection(self, imgs, orig_dim_list):
        pass

    ## Detecting on a single image
    @abstractmethod
    def detect_one_img(self, img_name):
        pass

    ## Model and Loading all in one function useful for systems that have processing bundled etc
    def all_in_one(self, img_name):
        pass

################################################################################
## mmdet_api.py
################################################################################
import numpy as np
# import torch
from proc_api import BaseDetector
from mmdet.apis import inference_detector, init_detector
from IPython.display import display
from matplotlib import pyplot as plt
import PIL.Image as Image

class MMSegDetector(BaseDetector):
    ## Initialize the model. 
    ## Load the model from configs, Set on evaluations
    def __init__(self,  det_config=None, det_checkpoint=None, device=None): 

        self.det_config = det_config #'configs/htc_x101_64x4d_fpn_dconv_c3-c5_mstrain_400_1400_16x1_20e_coco.py'
        self.det_checkpoint = det_checkpoint #'checkpoints/htc_x101_64x4d_fpn_dconv_c3-c5_mstrain_400_1400_16x1_20e_coco_20200312-946fd751.pth'
        self.device = 'cuda:0'
        # init a detector
        self.det_model = init_detector(self.det_config, self.det_checkpoint, device=self.device)
        self.det_results = None
        self.img_results = None
        self.classnames = self.det_model.CLASSES
        self.batch_size = 32

    ## Preprocessing a single image

    def image_preprocess(self, img_name):
        # mmdet_results = inference_detector(self.det_model, str(img_name) )
        pass

    ## Preprocessing a batch of images
    def image_batch_preprocess(self, img_list):
        pass

    ## Detecting on a batch of images with their filenames
    def images_detection(self, imgs, orig_dim_list):
        pass

    ## Detecting on a single image
    def detect_one_img(self, img, thres=0.3, cat_id = 1):
        # img_name = str(img_name)
        mmdet_results = inference_detector(self.det_model, img) ## Returns 2 items--bbox, segm. For segm, returns a list of 80 items
        self.det_results, self.mask_results = self.process_mmdet_results(mmdet_results, thres=thres, cat_id = cat_id)
        return {'detection':self.det_results, 'segmentation': self.mask_results} 

    ## Draw results
    def show_results(self, img_name, det_results=None, mask_results = None):
        if type(img_name) is str:
            img= np.asarray(Image.open(img_name),dtype=np.float)
        else:
            img = img_name
        # print(np.max(img[:]))
        img_final = np.zeros(img.shape, dtype = np.float)
        if mask_results is None:
            mask_results = self.mask_results
        for i in range(len(mask_results)):
            # img_final += (mask_results[i] * img)
            img_final[mask_results[i].squeeze()] = img[mask_results[i].squeeze()]

        max_img_final = np.max(img_final[:])
        img_final = img_final * 1.0 /max_img_final
        plt.figure(figsize = (12, 10))
        plt.imshow(img_final  )
        plt.show()

        return 

    ## Model and Loading all in one function useful for systems that have processing bundled etc
    def all_in_one(self, list_of_images_numpy):
        len_input = len(list_of_images_numpy)
        assert len_input <= self.batch_size
        mmdet_results = inference_detector(self.det_model, list_of_images_numpy)
        processed_results = [self.process_mmdet_results(mmdet_results[i]) for i in range(len_input)]
        seg_results = [processed_results[i][1] for i in range(len_input)]
        return seg_results

    ## Process the result of detection to get mask. Bbox is returned but it is not tested.
    def process_mmdet_results(self,mmdet_results, thres=0.3, cat_id = 1):
        """Process mmdet results, and return a list of bboxes and a tuple of segm.

        Args:
            mmdet_results (list|tuple): mmdet results.
            cat_id (int): category id (default: 1 for human)

        Returns:
            person_results (list): a list of detected bounding boxes
            person_results_segm (list(array)):  a list of segm masks
        """

        if isinstance(mmdet_results, tuple):
            det_results = mmdet_results[0]
            segm_results = mmdet_results[1] ## List of N items, each for a label
        else:
            det_results = mmdet_results
            segm_results = None

        ## Take out humans
        bboxes = det_results[cat_id - 1]
        # print("Boxes shape",bboxes.shape)
        masks = None
        if segm_results is not None: ## segm_results is list with N results, N = number of classes
            masks = segm_results[cat_id - 1]  # list object with M items of height x width for M detected humans

        ## Take out segmentations that are less than thres
        final_bboxes = bboxes
        final_segm = np.asarray(masks)
        if thres > 0:
            scores = bboxes[:, -1]  ## Number of boxes should match number of segmentation masks
            # print(scores)
            # print(thres)
            keep_inds = scores >= thres
            final_bboxes = bboxes[keep_inds,:]
            # print("Thre boxes shape", final_bboxes.shape)
            # print("Before keep, len masks", len(masks))
            # print("Before keep, shape masks", masks[0].shape)
            # print(keep_inds)
            final_segm = np.asarray(masks)[keep_inds] if masks is not None else None
            # print("Seg shape keep", final_segm.shape)

        ## Move over to lists -- Not required
        person_results = []
        # print(
        for bbox in final_bboxes:
            person = bbox
            person_results.append(person)

        ## Move over to lists -- Not really required
        mask_results = []
        # print("preparing masks")
        if final_segm is not None:
            for mask in final_segm:
                # print(mask.shape)
                person_mask = mask[:,:,np.newaxis]
                mask_results.append(person_mask)

        # print("Shape person_results", np.asarray(person_results).shape)

        return np.asarray(person_results), np.asarray(mask_results)

################################################################################
## Actual code being run processVideoFolderSequential.py
#################################################################################
## For videos testing
import numpy as np
import mmcv
import glob
from pathlib import Path
import os
import PIL.Image
import tqdm
import argparse
from mmdet_api import MMSegDetector
import cv2
import tqdm.notebook
from matplotlib import pyplot as plt
import csv
import sys
import inspect
import torch.nn as nn
import torch
import decord
import time

## Taken from fast.ai github get_files function
def _get_files(p, fs, extensions=None):
    p = Path(p)
    res = [p/f for f in fs if not f.startswith('.')
           and ((not extensions) or f'.{f.split(".")[-1].lower()}' in extensions)]
    return res

# Cell
def get_files(path, extensions=None, recurse=True, folders=None, followlinks=True):
    "Get all the files in `path` with optional `extensions`, optionally with `recurse`, only in `folders`, if specified."
    path = Path(path)
    print(path, path.exists())
    folders= list(folders) if folders is not None else []
    extensions = set(extensions) if extensions is not None else []
    extensions = {e.lower() for e in extensions}
    if recurse:
        res = []
        for i,(p,d,f) in enumerate(os.walk(path, followlinks=followlinks)): # returns (dirpath, dirnames, filenames)
            if len(folders) !=0 and i==0: d[:] = [o for o in d if o in folders]
            else:                         d[:] = [o for o in d if not o.startswith('.')]
            if len(folders) !=0 and i==0 and '.' not in folders: continue
            res += _get_files(p, f, extensions)
    else:
        f = [o.name for o in os.scandir(path) if o.is_file()]
        res = _get_files(path, f, extensions)
    return list(res)

def single_img_process(numpy_img, model, thresh =0.3):

    ## Process the image through the model
    output = model.detect_one_img(img = numpy_img, thres=thresh, cat_id = 1)           

    ## Extract out the masks
    det = output['detection']
    masks = output['segmentation']

    return det, masks

def parse_args():
    parser = argparse.ArgumentParser(description='MMDetection segmentation mask extraction')
    parser.add_argument('--video_folder',type=str, default = 'test_videos', help='Video folder path. Either videos or images')
    parser.add_argument('--image_folder',type=str, default = '', help='Image folder path. Either videos or images')
    parser.add_argument('--config', type=str,default = '../configs/htc_x101_64x4d_fpn_dconv_c3-c5_mstrain_400_1400_16x1_20e_coco.py', help='Config file')
    parser.add_argument('--checkpoint', type=str,default = '../checkpoints/htc_x101_64x4d_fpn_dconv_c3-c5_mstrain_400_1400_16x1_20e_coco_20200312-946fd751.pth', help='Checkpoint file')
    parser.add_argument('--score_thr', type=float, default=0.2, help='Bbox score threshold')
    parser.add_argument('--face_thr', type=float, default = 0.7, help = 'Face detection threshold')
    parser.add_argument('--face_config', type=str, default = '../configs/scrfd/scrfd_10g.py', help='Face config file')
    parser.add_argument('--face_checkpoint', type=str, default = '../scrfd_10g.pth', help='Face checkpoint file')
    parser.add_argument('--output_folder', type=str, default = "processed", help='Output folder for saving results')
    parser.add_argument('--device', type = str, default='cuda:0', help='Device used for inference')
    args, unknown = parser.parse_known_args()#parse_args()
    return args

def single_video_process(video_path, output_dir, csv_output_dir, model, score_thres=0.2, face_model = None, face_thres=0.2, resize_width = None, resize_height = None):

    ## Get the video -- sourceVidPath
    targeted_width = 1980
    if resize_width is not None:
        video = decord.VideoReader(str(video_path), width = resize_width, height = resize_height, ctx = decord.cpu(0))     
        frame = video[0].asnumpy()
        height, width, _ =  frame.shape
        asp_ratio = width/height
        targeted_width = int(np.ceil(1080 * asp_ratio))
        video = decord.VideoReader(str(video_path), width = targeted_width, height = 1080, ctx = decord.cpu(0))  
    else:
        video = mmcv.VideoReader(str(video_path))
        frame = np.asarray(mmcv.image.bgr2rgb(video[0]))
        height, width, _ =  frame.shape
        asp_ratio = width/height
        targeted_width = int(np.ceil( 1080 * asp_ratio ))

    ## Get the length
    video_len = len(video)

    ## Make the video folder -- makes up the segmentationPath
    video_output_path = Path(str(output_dir))#/Path(Path(video_path).stem)
    print("Outputting to : ", video_output_path)
    video_output_path.mkdir(exist_ok=True, parents=True)

    ## Starting point
    start_point = 0
    skip_frames = 5

    ## For each frame in the video
    print(f"Processing video: {Path(video_path).stem}") 
    print("frame shape: ", video[0].shape, " with video length ", len(video))
    tstart = time.time()
    tmodel = 0.0
    tframe = 0.0
    tsave = 0.0
    for frame_count in range(start_point, video_len): #tqdm.trange(0, video_len):# in video:
        ## Get the frame
        tfr_start = time.time()
        if isinstance(video, decord.VideoReader):
            frame = video[frame_count].asnumpy()
        else:
            frameR = cv2.resize(video[frame_count], (targeted_width, 1080))
            frame = mmcv.image.bgr2rgb(frameR)
        tfr_end = time.time()
        tframe += ( tfr_end - tfr_start)
        ## Resize frame
        height, width, _ =  frame.shape
        ## SegmentationPath for saving jpg. Masks can do with more compression

        output_path = f"{str(video_output_path)}/{str(frame_count).zfill(4)}_mask"
        if Path(output_path).exists():
            continue

        ## Process the frame
        tm_start = time.time()
        det, mask = single_img_process(frame, model, thresh = score_thres)
        tm_end = time.time()
        tmodel += (tm_end - tm_start)
        # print(det.shape, mask.shape)

        ## Save the result -- return the segmentation path -- NU Mod. 
        ## REturned mask_path should be same as output_path
        mask_paths = []
        bbox_paths = []
        tsave_start = time.time()
        for i in range(len(det)):
            x1, y1, x2, y2, conf = det[i]
            if conf > 0.2:

                ## DO masks
                try:
                    mask_i = mask[i]
                    mask_path = output_path + "_" + str(i) + ".jpg"
                    # save_mask(frame,[mask_i], Path(mask_path))
                    maskimg= PIL.Image.fromarray((mask_i*255).astype(np.uint8).squeeze(), 'L')
                    maskimg.save(mask_path, "JPEG")
                except:
                    mask_path = "Exception in saving. No mask. N/A"
                    # print('exception')
                mask_paths.append(mask_path)

        tsave_end = time.time()
        tsave += tsave_end - tsave_start

    print("Total time for video processing of this video: ", (time.time() - tstart ) / 60)
    print("Average time for model processing per frame: ", (tmodel/ video_len)/60)
    print("Average time for save processing per frame: ", (tsave/ video_len)/60)
    print("Average time for frame loading processing per frame: ", (tframe /video_len)/60)

    return

#process_videos(paths_to_process, output_dir, csv_output_path, csvname, csvname_faces, segDetector, score_thres = args.score_thr, faceDetector, face_thres)
def process_videos(folder_to_process, paths_to_process, output_dir="./", csv_output_path = "./csv_output_path/", model = None, score_thres = 0.2, face_model = None, face_thres=0.2,resize_width = None, resize_height = None):

    for i in tqdm.tqdm(range(len(paths_to_process)), desc = "Videos Progress"):
        video_folder_addr = str(Path(paths_to_process[i])).split(folder_to_process)[1]
        output_dir_final = Path(output_dir).joinpath( video_folder_addr[:-4] )
        print("Input to single_video_process: ", paths_to_process[i], " to be saved at ", output_dir_final)
        csv_output_dir = Path(csv_output_path).joinpath( video_folder_addr[:-4] )
        single_video_process(paths_to_process[i], output_dir_final, csv_output_dir = csv_output_dir, 
                             model=model,score_thres=score_thres,
                             face_model=face_model, face_thres=face_thres,resize_width = resize_width,resize_height = resize_height)

    return 

def main():

    args = parse_args()

    ## Setup body detector
    segDetector = MMSegDetector( det_config=args.config, det_checkpoint=args.checkpoint, device=args.device)
    img_locs = ""

    ## Setup face detector and variables
    face_thres = args.face_thr
    faceDetector = None #load_face_model('10g', device = args.device)

    ## Make the CSV for saving the data and the folders for results

    ## Make a folder for output
    output_dir = Path(args.output_folder)
    output_dir.mkdir(exist_ok=True, parents=True)
    csv_output_path = output_dir.joinpath("csv_results")

    ## Run through image folder
    if args.image_folder != "":                
            return

    if args.video_folder != "":

        paths_to_process = get_files("test_videos")
        print("We are processing: ", len(paths_to_process) , " videos")
        folder_to_process = "test_videos"
        process_videos(folder_to_process, paths_to_process, output_dir, csv_output_path, segDetector, args.score_thr, faceDetector, face_thres)

        return

    return

if __name__ == "__main__":
    main()

Reproduces the problem - command or script

python processVideoFolderSequential.py --video-dir 'test_videos'

Reproduces the problem - error message

The error is that the average time to process keeps increasing. For instance, here is a log output showing the processing for a video list:


We are processing:  39  videos
Videos Progress:   0%|          | 0/39 [00:00<?, ?it/s]
frame shape:  (1536, 2048, 3)  with video length  2970
Total time for video processing of this video:  13.740543262163799
Average time for model processing per frame:  0.004153024821169047
Average time for save processing per frame:  0.0003246144547339374
Average time for frame loading processing per frame:  0.00014598919352564613

Videos Progress:   3%|▎         | 1/39 [13:45<8:42:54, 825.65s/it]
frame shape:  (1536, 2048, 3)  with video length  2940
Total time for video processing of this video:  13.342071727911632
Average time for model processing per frame:  0.004083394715034503
Average time for save processing per frame:  0.0003131555348567141
Average time for frame loading processing per frame:  0.0001385582263777856

Videos Progress:   5%|▌         | 2/39 [27:07<8:20:34, 811.74s/it]
frame shape:  (1080, 1920, 3)  with video length  2970
Total time for video processing of this video:  14.188103183110554
Average time for model processing per frame:  0.004311154850671855
Average time for save processing per frame:  0.0003628664345853658
Average time for frame loading processing per frame:  0.00010017747161899485

Videos Progress:   8%|▊         | 3/39 [41:19<8:18:06, 830.19s/it]
frame shape:  (1080, 1920, 3)  with video length  2940
Total time for video processing of this video:  15.374861669540405
Average time for model processing per frame:  0.004700920431792331
Average time for save processing per frame:  0.00043057866918256763
Average time for frame loading processing per frame:  9.523972091761307e-05

Videos Progress:  10%|█         | 4/39 [56:42<8:25:40, 866.87s/it]
frame shape:  (1080, 1920, 3)  with video length  2970
Total time for video processing of this video:  29.291638139883677
Average time for model processing per frame:  0.009464694850238754
Average time for save processing per frame:  0.00029415066796119766
Average time for frame loading processing per frame:  0.00010110642372157035

Videos Progress:  13%|█▎        | 5/39 [1:26:00<11:13:20, 1188.25s/it]
frame shape:  (1536, 2048, 3)  with video length  2970
Total time for video processing of this video:  93.31125243902207
Average time for model processing per frame:  0.030968653992504112
Average time for save processing per frame:  0.0002959531645983558
Average time for frame loading processing per frame:  0.00015056785108264446

Videos Progress:  15%|█▌        | 6/39 [2:59:20<24:38:28, 2688.15s/it]
frame shape:  (1080, 1920, 3)  with video length  2970
Total time for video processing of this video:  40.9526007493337
Average time for model processing per frame:  0.013326975695880842
Average time for save processing per frame:  0.0003515281915397088
Average time for frame loading processing per frame:  0.00010719429496696634

Videos Progress:  18%|█▊        | 7/39 [3:40:18<23:13:34, 2612.94s/it]
Processing video: G00003_set2_struct_1622143510858_4087db76
frame shape:  (1080, 1920, 3)  with video length  2940
Total time for video processing of this video:  40.012428319454195
Average time for model processing per frame:  0.013105404120183585
Average time for save processing per frame:  0.00036640154261167356
Average time for frame loading processing per frame:  0.0001344353366060322

Videos Progress:  21%|██        | 8/39 [4:20:20<21:55:20, 2545.82s/it]
frame shape:  (1080, 1920, 3)  with video length  2970
Total time for video processing of this video:  44.05377306540807
Average time for model processing per frame:  0.014114095264278544
Average time for save processing per frame:  0.0006310871594682687
Average time for frame loading processing per frame:  8.499694726817668e-05

The time for total processing is increasing dramatically though there number of frames and frmae shape are not really different. In addition, I'm resizing to a height of 1080, with width between 1920-1980

Additional information

Expected Result

Processing time to be the same for each video, on average. The videos have one or two human objects in the scene and that is what I'm predicting on.

nikky4D commented 1 year ago

One question: Is there a way to improve the processing time for a video? Currently long videos 3K frames, at about 1980x1080 will take about 8-10mins without this memory issue. Could you give me suggestions on how to improve the code to go faster?

RangiLyu commented 1 year ago

One of the best ways for video inference is using NVIDIA DeepStream. You can refer to https://developer.nvidia.com/deepstream-sdk

nikky4D commented 1 year ago

One of the best ways for video inference is using NVIDIA DeepStream. You can refer to https://developer.nvidia.com/deepstream-sdk

Thanks I'll check it out. Would you have any insight into why the model processing takes longer on some videos, even with same resolution, than on others?

ZwwWayne commented 1 year ago

Because the videos have different length?

RangiLyu commented 1 year ago

One of the best ways for video inference is using NVIDIA DeepStream. You can refer to https://developer.nvidia.com/deepstream-sdk

Thanks I'll check it out. Would you have any insight into why the model processing takes longer on some videos, even with same resolution, than on others?

Because of the async computation of CUDA, the correct way to measure the inference time is like this: https://github.com/open-mmlab/mmdetection/blob/e71b499608e9c3ccd4211e7c815fa20eeedf18a2/tools/analysis_tools/benchmark.py#L107-L114

You must run torch.cuda.synchronize() before counting time.

nikky4D commented 1 year ago

@ZwwWayne Thanks. I checked this earlier. The videos have different lengths but not to the point where the processing time jumps to almost 40mins. In addition, I have one human object in the scene so I don't expect a great many more detections from video to video

@RangiLyu Thank you, I'll add that to the time counter. Is there a way to use multiprocessing, maybe across GPUs to process this faster? or even multiprocessing on one GPU. My GPU has about 48GB so could take multiple copies of the model if needed for processing. Is there a sample you could point me to?