nv-nguyen / gigapose

[CVPR 2024] PyTorch implementation of GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
https://nv-nguyen.github.io/gigaPose/
MIT License
144 stars 12 forks source link

Geting 'Killed' after trying to run test.py #23

Closed mmkolbe closed 1 month ago

mmkolbe commented 1 month ago

I ran python test.py test_dataset_name=hope run_id=202410081022 test_setting=detection The process starts and gets this, with this Killed in the and:

(gigapose) marcel@marcel-Aspire-GX-783:~/gigapose$ python test.py test_dataset_name=hope run_id=202410081022 test_setting=detection
[2024-10-08 10:37:54,136][__main__][INFO] - Initializing logger, callbacks and trainer
[2024-10-08 10:37:54,140][__main__][INFO] - Tensorboard logger initialized at ./gigaPose_datasets/results/large_202410081022/gigapose
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[2024-10-08 10:37:54,240][__main__][INFO] - Trainer initialized!
Using cache found in /home/marcel/.cache/torch/hub/facebookresearch_dinov2_main
[2024-10-08 10:37:56,227][dinov2][INFO] - using MLP layer as FFN
[2024-10-08 10:38:03,470][src.models.network.ae_net][INFO] - Initialize AENet done!
[2024-10-08 10:38:03,634][src.models.network.ist_net][INFO] - Init for Regressor with done!
[2024-10-08 10:38:03,717][src.models.network.ist_net][INFO] - Init weights for ISTNet done!
[2024-10-08 10:38:03,717][src.models.network.ist_net][INFO] - Init for ISTNet done!
[2024-10-08 10:38:03,729][src.models.gigaPose][INFO] - Initialize GigaPose done!
[2024-10-08 10:38:03,729][__main__][INFO] - Model initialized!
[2024-10-08 10:38:03,857][src.dataloader.test][INFO] - Split: test for hope!
[2024-10-08 10:38:03,861][src.custom_megapose.web_scene_dataset][INFO] - WebSceneDataset: 4 shards
[2024-10-08 10:38:03,861][src.custom_megapose.web_scene_dataset][INFO] - IterableWebSceneDataset: 457 samples
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 12147.34it/s]
[2024-10-08 10:38:03,870][src.custom_megapose.template_dataset][INFO] - Loaded 28 template datas
dataset_name: (hope)
det_model:cnos-sam
det_model:cnos-sam
[2024-10-08 10:38:05,492][src.dataloader.keypoints][INFO] - Initialized normalized center patch done!
cfg.machine.num_workers=10
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 14416.95it/s]
[2024-10-08 10:38:05,510][src.custom_megapose.template_dataset][INFO] - Loaded 28 template datas
[2024-10-08 10:38:05,511][__main__][INFO] - Dataloaders initialized!
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

Restoring states from the checkpoint path at ./gigaPose_datasets/pretrained/gigaPose_v1.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from the checkpoint at ./gigaPose_datasets/pretrained/gigaPose_v1.ckpt
Testing: |                                                                                                                                                                           | 0/? [00:00<?, ?it/s]

Killed
mmkolbe commented 1 month ago

It's related to insufficient memory, as I have a simple RTX card. I changed the parameters bellow and got it running. Had some other non-related issue after during the test, but this first part was solved. The parameters are at: /home/marcel/gigapose/configs/machine/local.yaml

From:

# specific attributes to this machine
batch_size: 12
num_workers: 10

To:

# specific attributes to this machine
batch_size: 2
num_workers: 0
# 0 stands for automatic.