Open ArghyaChatterjee opened 2 years ago
Hi, thanks for your interest in our work.
Which of our pretrained models are you using? We have models that take in small, medium, or large crop sizes (see https://github.com/mseg-dataset/mseg-semantic#how-fast-can-your-models-run).
Are you using single-scale or multi-scale inference?
The numbers reported in the readme here are for single-scale prediction with our lightweight model. You would need to use the following script instead:
https://github.com/mseg-dataset/mseg-semantic/blob/master/mseg_semantic/tool/universal_demo_batched.py
Hi @johnwlambert , thanks for the reply. I will be discussing about the inference time further in this thread but I need a help regarding running this universal_demo_batched.py
file. Here is what I am giving and here is what I am getting:
python3 -u ~/mseg-semantic/mseg_semantic/tool/universal_demo_batched.py --config=/home/arghya/mseg-semantic/mseg_semantic/config/test/720/default_config_batched_ss.yaml model_name mseg-3m-720p model_path ~/Downloads/mseg-3m-720p.pth input_file ~/Downloads/Urban
Namespace(config='/home/arghya/mseg-semantic/mseg_semantic/config/test/720/default_config_batched_ss.yaml', file_save='default', opts=['model_name', 'mseg-3m-720p', 'model_path', '/home/arghya/Downloads/mseg-3m-720p.pth', 'input_file', '/home/arghya/Downloads/Urban'])
arch: hrnet
base_size: None
batch_size_val: 1
dataset: Urban
has_prediction: False
ignore_label: 255
img_name_unique: False
index_start: 0
index_step: 0
input_file: /home/arghya/Downloads/Urban
layers: 50
model_name: mseg-3m-720p
model_path: /home/arghya/Downloads/mseg-3m-720p.pth
native_img_h: -1
native_img_w: -1
network_name: None
save_folder: default
scales: [1.0]
small: True
split: val
test_gpu: [0]
test_h: 593
test_w: 593
vis_freq: 20
workers: 16
zoom_factor: 8
[2021-07-29 19:00:01,013 INFO universal_demo_batched.py line 68 3756] arch: hrnet
base_size: None
batch_size_val: 1
dataset: Urban
has_prediction: False
ignore_label: 255
img_name_unique: True
index_start: 0
index_step: 0
input_file: /home/arghya/Downloads/Urban
layers: 50
model_name: mseg-3m-720p
model_path: /home/arghya/Downloads/mseg-3m-720p.pth
native_img_h: -1
native_img_w: -1
network_name: None
print_freq: 10
save_folder: default
scales: [1.0]
small: True
split: test
test_gpu: [0]
test_h: 593
test_w: 593
u_classes: ['backpack', 'umbrella', 'bag', 'tie', 'suitcase', 'case', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'animal_other', 'microwave', 'radiator', 'oven', 'toaster', 'storage_tank', 'conveyor_belt', 'sink', 'refrigerator', 'washer_dryer', 'fan', 'dishwasher', 'toilet', 'bathtub', 'shower', 'tunnel', 'bridge', 'pier_wharf', 'tent', 'building', 'ceiling', 'laptop', 'keyboard', 'mouse', 'remote', 'cell phone', 'television', 'floor', 'stage', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot_dog', 'pizza', 'donut', 'cake', 'fruit_other', 'food_other', 'chair_other', 'armchair', 'swivel_chair', 'stool', 'seat', 'couch', 'trash_can', 'potted_plant', 'nightstand', 'bed', 'table', 'pool_table', 'barrel', 'desk', 'ottoman', 'wardrobe', 'crib', 'basket', 'chest_of_drawers', 'bookshelf', 'counter_other', 'bathroom_counter', 'kitchen_island', 'door', 'light_other', 'lamp', 'sconce', 'chandelier', 'mirror', 'whiteboard', 'shelf', 'stairs', 'escalator', 'cabinet', 'fireplace', 'stove', 'arcade_machine', 'gravel', 'platform', 'playingfield', 'railroad', 'road', 'snow', 'sidewalk_pavement', 'runway', 'terrain', 'book', 'box', 'clock', 'vase', 'scissors', 'plaything_other', 'teddy_bear', 'hair_dryer', 'toothbrush', 'painting', 'poster', 'bulletin_board', 'bottle', 'cup', 'wine_glass', 'knife', 'fork', 'spoon', 'bowl', 'tray', 'range_hood', 'plate', 'person', 'rider_other', 'bicyclist', 'motorcyclist', 'paper', 'streetlight', 'road_barrier', 'mailbox', 'cctv_camera', 'junction_box', 'traffic_sign', 'traffic_light', 'fire_hydrant', 'parking_meter', 'bench', 'bike_rack', 'billboard', 'sky', 'pole', 'fence', 'railing_banister', 'guard_rail', 'mountain_hill', 'rock', 'frisbee', 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', 'net', 'base', 'sculpture', 'column', 'fountain', 'awning', 'apparel', 'banner', 'flag', 'blanket', 'curtain_other', 'shower_curtain', 'pillow', 'towel', 'rug_floormat', 'vegetation', 'bicycle', 'car', 'autorickshaw', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'trailer', 'boat_ship', 'slow_wheeled_object', 'river_lake', 'sea', 'water_other', 'swimming_pool', 'waterfall', 'wall', 'window', 'window_blind']
vis_freq: 20
workers: 16
zoom_factor: 8
[2021-07-29 19:00:01,013 INFO universal_demo_batched.py line 69 3756] => creating model ...
[2021-07-29 19:00:04,156 INFO inference_task.py line 307 3756] => loading checkpoint '/home/arghya/Downloads/mseg-3m-720p.pth'
[2021-07-29 19:00:04,691 INFO inference_task.py line 313 3756] => loaded checkpoint '/home/arghya/Downloads/mseg-3m-720p.pth'
[2021-07-29 19:00:04,696 INFO batched_inference_task.py line 65 3756] >>>>>>>>>>>>>> Start inference task >>>>>>>>>>>>>
Totally 4 samples in test set.
Checking image&label pair test list done!
[2021-07-29 19:00:05,035 INFO batched_inference_task.py line 98 3756] On batch 0
Traceback (most recent call last):
File "/home/arghya/mseg-semantic/mseg_semantic/tool/universal_demo_batched.py", line 132, in <module>
run_universal_demo_batched(args, use_gpu)
File "/home/arghya/mseg-semantic/mseg_semantic/tool/universal_demo_batched.py", line 87, in run_universal_demo_batched
itask.execute()
File "/home/arghya/mseg-semantic/mseg_semantic/tool/batched_inference_task.py", line 73, in execute
self.execute_on_dataloader_batched(test_loader)
File "/home/arghya/mseg-semantic/mseg_semantic/tool/batched_inference_task.py", line 101, in execute_on_dataloader_batched
gray_batch = self.execute_on_batch(input)
File "/home/arghya/mseg-semantic/mseg_semantic/tool/batched_inference_task.py", line 128, in execute_on_batch
logits = self.scale_process_cuda_batched(batch, self.args.native_img_h, self.args.native_img_w)
File "/home/arghya/mseg-semantic/mseg_semantic/tool/batched_inference_task.py", line 163, in scale_process_cuda_batched
self.std
File "/home/arghya/mseg-semantic/mseg_semantic/tool/batched_inference_task.py", line 54, in pad_to_crop_sz_batched
padded_batch[:, :, start_h:end_h, start_w:end_w] = batch
RuntimeError: The expanded size of the tensor (593) must match the existing size (1054) at non-singleton dimension 3. Target sizes: [1, 3, 593, 593]. Tensor sizes: [3, 593, 1054]
Looks like there is a tensor size mismatching. There are 2 questions:
universal_demo_batched.py
file does not take video input. Is there a way to give video input to this one ?? Initially I was giving 1280x720 @30 fps video input (.mp4 format) but as it was not supporting any videos. Then I extracted the (1280 x 720) frames of the video, put them inside a directory and gave the image directory path as input path. I'm late on this but I figured out the issue after debugging it myself. In the default .yaml (in my case config/test/480/default_config_batched_ss.yaml
) two values need to be set.
native_img_h
native_img_w
By default they are -1 and apparently supplementing them via commandline wasn't enough. Anyway, in the unlikely case this reply helps someone make sure native_img_h
and w
are set properly so you don't have squished portrait-mode outputs like I did. Whoops.
I'm late on this but I figured out the issue after debugging it myself. In the default .yaml (in my case
config/test/480/default_config_batched_ss.yaml
) two values need to be set.native_img_h native_img_w
By default they are -1 and apparently supplementing them via commandline wasn't enough. Anyway, in the unlikely case this reply helps someone make sure
native_img_h
andw
are set properly so you don't have squished portrait-mode outputs like I did. Whoops.
Nice work. @ArghyaChatterjee I encountered the same problem as you. I run the Mseg network on Nividia 3060 platform and the average time to process a frame is about 0.5s (single-scale). According to the writer's reply, I used universal_demo_batched.py, instead of universal_demo.py, and set the native height and width of the pic. It looks very fast!
Hallo, i am currently working with enhancing image enhancement paper and algorithm and trying to implement that. In the process, we need to use MSeg-segmentation for real and rendered images/ datasets. i have like 50-60k images. but now i want to test the first 2500 images. The images is from Playing for data, so 1914 x 1052 rgb images. (https://download.visinf.tu-darmstadt.de/data/from_games/)
So the dependencies MSeg-api and MSeg_semantic were already installed. I tried the google collab first and then copying the commands, so i could run the script in my linux also. the command is like this: python -u mseg_semantic/tool/universal_demo.py --config="default_config_360.yaml" model_name mseg-3m model_path mseg-3m.pth input_file /home/luda1013/PfD/image/try_images
the weight i used, i downloaded it from the google collab, so the mseg-3m-1080.pth
But for me, it took 12 minutes just for 1 image.... do u know why so or where i should check it more?
My setup: Ubuntu 20.04.6 LTS core i9-10980XE @ 3GHz graphic (nvidia-smi): NVIDIA RTX A5000 (48 GB) Cuda 11.7 Pytorch version : 2.0.0 Cuda at: /usr/local/cuda*
Thanks for your help.. i cannot go forward if 1 image takes 12 min :( i already tried the free version on colab, onl get like 140 images before my compute units turn to 0 and cannot use cuda/ GPU anymore.. :(
Hallo, i am currently working with enhancing image enhancement paper and algorithm and trying to implement that. In the process, we need to use MSeg-segmentation for real and rendered images/ datasets. i have like 50-60k images. but now i want to test the first 2500 images. The images is from Playing for data, so 1914 x 1052 rgb images. (https://download.visinf.tu-darmstadt.de/data/from_games/)
So the dependencies MSeg-api and MSeg_semantic were already installed. I tried the google collab first and then copying the commands, so i could run the script in my linux also. the command is like this: python -u mseg_semantic/tool/universal_demo.py --config="default_config_360.yaml" model_name mseg-3m model_path mseg-3m.pth input_file /home/luda1013/PfD/image/try_images
the weight i used, i downloaded it from the google collab, so the mseg-3m-1080.pth
But for me, it took 12 minutes just for 1 image.... do u know why so or where i should check it more?
My setup: Ubuntu 20.04.6 LTS core i9-10980XE @ 3GHz graphic (nvidia-smi): NVIDIA RTX A5000 (48 GB) Cuda 11.7 Pytorch version : 2.0.0 Cuda at: /usr/local/cuda*
Thanks for your help.. i cannot go forward if 1 image takes 12 min :( i already tried the free version on colab, onl get like 140 images before my compute units turn to 0 and cannot use cuda/ GPU anymore.. :(
Hi, @luda1013. I think you should use the batched script, you can find it in "/mseg-semantic/mseg_semantic/tool/universal_demo_batched.py". It seems the algorithm would use OpenCV instead of CUDA when executing "universal_demo.py".
So the yaml file i got it, it is in the description of the file location, so with the same configuration: same model and config
i have encountered the problem:
so it needs to resize the width/ height? each image has width x height = 1914 x 1052 pixels
so i edit the yaml file, i changed: native_img_h: 1052 native_img_w: 1914
according to the size of images that i have. i was on testing just with 3 images, but what i got is only the gray scale image?
It's taking 35-40s to process the segmentation of a single frame. My test setup configuration: i. Ubuntu 18.04 LTS, core i7, 24 Gb RAM ii. Graphics Nvidia 1070M (Laptop version of 1070Ti) iii. Cuda 10.2 iv. Pytorch version: 1.6.0 + cu101 v. CUDA_HOME = /usr/local/cuda-10.2 This is the output from my terminal:
I think I am missing something. According to this repo, the detection fps should be around 16 fps on a quadro P5000 and I think for Nvidia GTX 1070, it should be something similar but not (1/40) fps.
Can anybody help ??