kubeedge / ianvs

Distributed Synergy AI Benchmarking
https://ianvs.readthedocs.io
Apache License 2.0
115 stars 46 forks source link

question about semantic-segmentation environment #102

Closed IcyFeather233 closed 5 months ago

IcyFeather233 commented 5 months ago

I am following semantic-segmentation README, when I running ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml, it shows:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=6632e63a-19f0-11ef-8dca-8576dbea9f3c) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: load module(url=./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/basemodel-simple.py) failed, error: libcudart.so.11.0: cannot open shared object file: No such file or directory..

After searching on the Internet, I know it's probably about version conficts. But I hope there is a detailed version requirements(such as cuda version, torch version, etc.) to help me solve this.

And here is my env info:

(ianvs) icyfeather@gpu3:~/project/ianvs$ pip list
Package                  Version     Editable project location
------------------------ ----------- -----------------------------------------
absl-py                  2.1.0
addict                   2.4.0
asgiref                  3.8.1
astor                    0.8.1
cachetools               4.2.4
certifi                  2024.2.2
charset-normalizer       3.3.2
click                    8.1.7
colorlog                 4.7.2
contourpy                1.2.1
cycler                   0.12.1
fastapi                  0.68.2
filelock                 3.14.0
fonttools                4.51.0
fsspec                   2024.5.0
gast                     0.5.4
google-auth              1.35.0
google-auth-oauthlib     0.4.6
google-pasta             0.2.0
grpcio                   1.64.0
h11                      0.14.0
h5py                     3.11.0
ianvs                    0.1.0
idna                     3.7
importlib_metadata       7.1.0
importlib_resources      6.4.0
install                  1.3.5
Jinja2                   3.1.4
joblib                   1.2.0
Keras-Applications       1.0.8
Keras-Preprocessing      1.1.2
kiwisolver               1.4.5
Markdown                 3.6
markdown-it-py           3.0.0
MarkupSafe               2.1.5
matplotlib               3.9.0
mdurl                    0.1.2
minio                    7.0.4
mmcv                     2.0.0
mmdet                    3.1.0       /home/icyfeather/project/mmdetection
mmengine                 0.10.4
mpmath                   1.3.0
mypath                   0.1
networkx                 3.2.1
numpy                    1.23.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
oauthlib                 3.2.2
opencv-python            4.9.0.80
packaging                24.0
pandas                   2.2.2
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.2
prettytable              2.5.0
protobuf                 3.20.3
pyasn1                   0.6.0
pyasn1_modules           0.4.0
pycocotools              2.0.7
pydantic                 1.10.15
Pygments                 2.18.0
pyparsing                3.1.2
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.1
requests                 2.32.2
requests-oauthlib        2.0.0
rich                     13.7.1
rsa                      4.9
scikit-learn             1.5.0
scipy                    1.13.1
sedna                    0.4.1
segment-anything         1.0         /home/icyfeather/project/segment-anything
setuptools               54.2.0
shapely                  2.0.4
six                      1.15.0
starlette                0.14.2
sympy                    1.12
tenacity                 8.0.1
tensorboard              2.3.0
tensorboard-plugin-wit   1.8.1
tensorflow-estimator     1.14.0
termcolor                2.4.0
terminaltables           3.1.10
threadpoolctl            3.5.0
tomli                    2.0.1
torch                    2.3.0
torchaudio               2.3.0
torchvision              0.18.0
tqdm                     4.66.4
triton                   2.3.0
typing_extensions        4.11.0
tzdata                   2024.1
urllib3                  2.2.1
uvicorn                  0.14.0
wcwidth                  0.2.13
websockets               9.1
Werkzeug                 3.0.3
wheel                    0.43.0
wrapt                    1.16.0
yapf                     0.40.2
zipp                     3.18.2
(ianvs) icyfeather@gpu3:~/project/ianvs$ nvidia-smi 
Sat May 25 01:11:54 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off |   00000000:02:00.0 Off |                  N/A |
|  0%   42C    P8              1W /  260W |      16MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1082      G   /usr/lib/xorg/Xorg                              9MiB |
|    0   N/A  N/A      1397      G   /usr/bin/gnome-shell                            3MiB |
+-----------------------------------------------------------------------------------------+

Should I downgrade my cuda version?

hsj576 commented 5 months ago

It looks like a cuda library version incompatibility issue.

IcyFeather233 commented 5 months ago

1. mmcv verstion should be: mmcv>=2.0.0rc4, <2.1.0, or it will shows error like this:

File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/basemodel-simple.py", line 14, in <module>
    from RFNet.eval import Validator, load_my_state_dict
  File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/eval.py", line 13, in <module>
    from mmdet.visualization.image import imshow_det_bboxes
  File "/home/icyfeather/project/mmdetection/mmdet/__init__.py", line 16, in <module>
    assert (mmcv_version >= digit_version(mmcv_minimum_version)
AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.1.0.

2. mmcv is heavily relying on the versions of the PyTorch and Cuda installed. The installation of mmcv should ref to this: https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-pip

For example, cuda 11.8 and torch 2.1.x cannot install suitable mmcv as shown below, cannot match the requirement mmcv<2.1.0

image

In the doc, the installation step is python -m pip install https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.0.0-cp39-cp39-manylinux1_x86_64.whl which is too simple and may mislead someone who don't use cuda 11.8, torch 2.0.0 and python3.9

conclusion You should find a cuda-torch-pair in https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-pip which support the installation of mmcv>=2.0.0rc4, <2.1.0, and change your current torch or cuda version.

hsj576 commented 5 months ago

1. mmcv verstion should be: mmcv>=2.0.0rc4, <2.1.0, or it will shows error like this:

File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/basemodel-simple.py", line 14, in <module>
    from RFNet.eval import Validator, load_my_state_dict
  File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/eval.py", line 13, in <module>
    from mmdet.visualization.image import imshow_det_bboxes
  File "/home/icyfeather/project/mmdetection/mmdet/__init__.py", line 16, in <module>
    assert (mmcv_version >= digit_version(mmcv_minimum_version)
AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.1.0.

2. mmcv is heavily relying on the versions of the PyTorch and Cuda installed. The installation of mmcv should ref to this: https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-pip

For example, cuda 11.8 and torch 2.1.x cannot install suitable mmcv as shown below, cannot match the requirement mmcv<2.1.0 image

In the doc, the installation step is python -m pip install https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.0.0-cp39-cp39-manylinux1_x86_64.whl which is too simple and may mislead someone who don't use cuda 11.8, torch 2.0.0 and python3.9

conclusion You should find a cuda-torch-pair in https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-pip which support the installation of mmcv>=2.0.0rc4, <2.1.0, and change your current torch or cuda version.

Good job! (P.S. mmcv is installed only for visualization of semantic segmentation results. If visualization is not required, you could annotate out all the mmcv content.)

IcyFeather233 commented 5 months ago

My env info(successfully run ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml and no requirements problem):

Python 3.9
Cuda 11.8

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

pip install mmcv==2.0.1 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.0/index.html

However I met another strange problem:

if I run

import torch

if torch.cuda.is_available():
    print("CUDA is available! You can use GPU acceleration.")
    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()
    print(f"Number of GPUs available: {num_gpus}")
else:
    print("CUDA is not available.")

in project root dir ~/project/ianvs/, it shows:

(ianvs) icyfeather@gpu:~/project/ianvs$ python test_cuda.py 
CUDA is available! You can use GPU acceleration.
Number of GPUs available: 1

If I add this to ~/project/ianvs/examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/eval.py:

class Validator(object):
    def __init__(self, args, data=None, unseen_detection=False):
        self.args = args
        self.time_train = []
        self.num_class = args.num_class

        # Define Dataloader
        kwargs = {'num_workers': args.workers, 'pin_memory': False}
        # _, self.val_loader, _, self.custom_loader, self.num_class = make_data_loader(args, **kwargs)
        _, _, self.test_loader, _ = make_data_loader(args, test_data=data, **kwargs)
        print('un_classes:'+str(self.num_class))

        # Define evaluator
        self.evaluator = Evaluator(self.num_class)

        if torch.cuda.is_available():
            print("CUDA is available! You can use GPU acceleration.")
            # Get the number of available GPUs
            num_gpus = torch.cuda.device_count()
            print(f"Number of GPUs available: {num_gpus}")
        else:
            print("CUDA is not available.")

and when I run ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml, it shows:

(ianvs) icyfeather@gpu:~/project/ianvs$ ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml
un_classes:30
CUDA is not available.
Upsample layer: in = 128, skip = 64, out = 128
Upsample layer: in = 128, skip = 128, out = 128
Upsample layer: in = 128, skip = 256, out = 128
128
Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/module/module.py", line 114, in get_module_instance
    func = ClassFactory.get_cls(
  File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/basemodel-simple.py", line 36, in __init__
    self.validator = Validator(self.val_args)
  File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/eval.py", line 65, in __init__
    self.model = self.model.cuda(args.gpu_ids)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 72, in run
    paradigm = self.algorithm.paradigm(workspace=self.output_dir,
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/algorithm.py", line 105, in paradigm
    return LifelongLearning(workspace, **config)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 58, in __init__
    ParadigmBase.__init__(self, workspace, **kwargs)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/base.py", line 55, in __init__
    self.module_instances = self._get_module_instances()
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/base.py", line 75, in _get_module_instances
    func = module.get_module_instance(module_type)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/module/module.py", line 119, in get_module_instance
    raise RuntimeError(f"module(type={module_type} loads class(name={self.name}) "
RuntimeError: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 54, in run_testcases
    res, time = (testcase.run(workspace), utils.get_local_time())
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 79, in run
    raise RuntimeError(
RuntimeError: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/icyfeather/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 93, in run
    succeed_testcases, test_results = self.testcase_controller.run_testcases(self.workspace)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 56, in run_testcases
    raise RuntimeError(f"testcase(id={testcase.id}) runs failed, error: {err}") from err
RuntimeError: testcase(id=7438c7a6-20f1-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=7438c7a6-20f1-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available..

So THERE EXISTS CUDA, but when I run ianvs -f xxx, it disappears. I wonder why.

hsj576 commented 5 months ago

My env info(successfully run ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml and no requirements problem):

Python 3.9
Cuda 11.8

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

pip install mmcv==2.0.1 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.0/index.html

However I met another strange problem:

if I run

import torch

if torch.cuda.is_available():
    print("CUDA is available! You can use GPU acceleration.")
    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()
    print(f"Number of GPUs available: {num_gpus}")
else:
    print("CUDA is not available.")

in project root dir ~/project/ianvs/, it shows:

(ianvs) icyfeather@gpu:~/project/ianvs$ python test_cuda.py 
CUDA is available! You can use GPU acceleration.
Number of GPUs available: 1

If I add this to ~/project/ianvs/examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/eval.py:

class Validator(object):
    def __init__(self, args, data=None, unseen_detection=False):
        self.args = args
        self.time_train = []
        self.num_class = args.num_class

        # Define Dataloader
        kwargs = {'num_workers': args.workers, 'pin_memory': False}
        # _, self.val_loader, _, self.custom_loader, self.num_class = make_data_loader(args, **kwargs)
        _, _, self.test_loader, _ = make_data_loader(args, test_data=data, **kwargs)
        print('un_classes:'+str(self.num_class))

        # Define evaluator
        self.evaluator = Evaluator(self.num_class)

        if torch.cuda.is_available():
            print("CUDA is available! You can use GPU acceleration.")
            # Get the number of available GPUs
            num_gpus = torch.cuda.device_count()
            print(f"Number of GPUs available: {num_gpus}")
        else:
            print("CUDA is not available.")

and when I run ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml, it shows:

(ianvs) icyfeather@gpu:~/project/ianvs$ ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml
un_classes:30
CUDA is not available.
Upsample layer: in = 128, skip = 64, out = 128
Upsample layer: in = 128, skip = 128, out = 128
Upsample layer: in = 128, skip = 256, out = 128
128
Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/module/module.py", line 114, in get_module_instance
    func = ClassFactory.get_cls(
  File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/basemodel-simple.py", line 36, in __init__
    self.validator = Validator(self.val_args)
  File "/home/icyfeather/project/ianvs/./examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/eval.py", line 65, in __init__
    self.model = self.model.cuda(args.gpu_ids)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 72, in run
    paradigm = self.algorithm.paradigm(workspace=self.output_dir,
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/algorithm.py", line 105, in paradigm
    return LifelongLearning(workspace, **config)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 58, in __init__
    ParadigmBase.__init__(self, workspace, **kwargs)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/base.py", line 55, in __init__
    self.module_instances = self._get_module_instances()
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/base.py", line 75, in _get_module_instances
    func = module.get_module_instance(module_type)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/module/module.py", line 119, in get_module_instance
    raise RuntimeError(f"module(type={module_type} loads class(name={self.name}) "
RuntimeError: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 54, in run_testcases
    res, time = (testcase.run(workspace), utils.get_local_time())
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 79, in run
    raise RuntimeError(
RuntimeError: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/icyfeather/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 93, in run
    succeed_testcases, test_results = self.testcase_controller.run_testcases(self.workspace)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 56, in run_testcases
    raise RuntimeError(f"testcase(id={testcase.id}) runs failed, error: {err}") from err
RuntimeError: testcase(id=7438c7a6-20f1-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=7438c7a6-20f1-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: module(type=basemodel loads class(name=BaseModel) failed, error: No CUDA GPUs are available..

So THERE EXISTS CUDA, but when I run ianvs -f xxx, it disappears. I wonder why.

Maybe it's caused by the "os.environ['CUDA_VISIBLE_DEVICES'] = '1'".

Screenshot 2024-06-03 140427

IcyFeather233 commented 5 months ago

Thanks so much! delete "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" and it works.

btw, I don't know why there is "os.environ['CUDA_VISIBLE_DEVICES'] = '1'", is it necessary for some reason?

IcyFeather233 commented 5 months ago

Another problem, when I run ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml:

...(many lines)
CPA:0.07153843458791581, mIoU:0.005019460125166828, fwIoU: 0.03151069938676472
Found 50 test RGB images
Found 50 test disparity images
:   0%|                                                                                                                        | 0/50 [00:00<?, ?it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:   8%|████████▉                                                                                                       | 4/50 [00:00<00:01, 39.04it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  16%|█████████████████▉                                                                                              | 8/50 [00:00<00:01, 39.21it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  24%|██████████████████████████▋                                                                                    | 12/50 [00:00<00:00, 39.34it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  32%|███████████████████████████████████▌                                                                           | 16/50 [00:00<00:00, 39.31it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  40%|████████████████████████████████████████████▍                                                                  | 20/50 [00:00<00:00, 39.38it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  48%|█████████████████████████████████████████████████████▎                                                         | 24/50 [00:00<00:00, 38.93it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  56%|██████████████████████████████████████████████████████████████▏                                                | 28/50 [00:00<00:00, 38.98it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  64%|███████████████████████████████████████████████████████████████████████                                        | 32/50 [00:00<00:00, 38.99it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  72%|███████████████████████████████████████████████████████████████████████████████▉                               | 36/50 [00:00<00:00, 39.05it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  80%|████████████████████████████████████████████████████████████████████████████████████████▊                      | 40/50 [00:01<00:00, 39.16it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  88%|█████████████████████████████████████████████████████████████████████████████████████████████████▋             | 44/50 [00:01<00:00, 39.20it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  96%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▌    | 48/50 [00:01<00:00, 39.18it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 39.13it/s]
-----------Acc of each classes-----------
road         : 81.655560 %
sidewalk     : 0.000000 %
building     : 5.043014 %
wall         : 0.000000 %
fence        : 0.000000 %
pole         : 0.000000 %
traffic light: nan %
traffic sign : nan %
vegetation   : 0.000000 %
terrain      : 0.000000 %
sky          : 0.000000 %
person       : 0.000000 %
rider        : nan %
car          : 0.000000 %
truck        : nan %
bus          : nan %
train        : nan %
motorcycle   : nan %
bicycle      : nan %
stair        : 0.003421 %
curb         : 0.000000 %
ramp         : nan %
runway       : nan %
flowerbed    : 0.000000 %
door         : 0.000000 %
CCTV camera  : 0.000000 %
Manhole      : nan %
hydrant      : nan %
belt         : nan %
dustbin      : nan %
-----------IoU of each classes-----------
road         : 20.728859 %
sidewalk     : 0.000000 %
building     : 4.824876 %
wall         : 0.000000 %
fence        : 0.000000 %
pole         : 0.000000 %
traffic light: 0.000000 %
traffic sign : nan %
vegetation   : 0.000000 %
terrain      : 0.000000 %
sky          : 0.000000 %
person       : 0.000000 %
rider        : 0.000000 %
car          : 0.000000 %
truck        : 0.000000 %
bus          : 0.000000 %
train        : 0.000000 %
motorcycle   : 0.000000 %
bicycle      : 0.000000 %
stair        : 0.003399 %
curb         : 0.000000 %
ramp         : 0.000000 %
runway       : 0.000000 %
flowerbed    : 0.000000 %
door         : 0.000000 %
CCTV camera  : 0.000000 %
Manhole      : 0.000000 %
hydrant      : 0.000000 %
belt         : 0.000000 %
dustbin      : 0.000000 %
-----------FWIoU of each classes-----------
road         : 4.382773 %
sidewalk     : 0.000000 %
-----------freq of each classes-----------
road         : 21.143340 %
sidewalk     : 16.498009 %
building     : 34.396249 %
wall         : 0.519759 %
fence        : 0.032960 %
pole         : 0.924427 %
traffic light: 0.000000 %
traffic sign : 0.000000 %
vegetation   : 15.705082 %
terrain      : 0.970381 %
sky          : 4.972848 %
person       : 0.000989 %
rider        : 0.000000 %
car          : 1.175906 %
truck        : 0.000000 %
bus          : 0.000000 %
train        : 0.000000 %
motorcycle   : 0.000000 %
bicycle      : 0.000000 %
stair        : 1.532379 %
curb         : 1.699381 %
ramp         : 0.000000 %
runway       : 0.000000 %
flowerbed    : 0.143163 %
door         : 0.283993 %
CCTV camera  : 0.001134 %
Manhole      : 0.000000 %
hydrant      : 0.000000 %
belt         : 0.000000 %
dustbin      : 0.000000 %
CPA:0.05418874686914789, mIoU:0.008812804591714955, fwIoU: 0.060424015451444865
[2024-06-03 16:24:27,384] task_evaluation.py(69) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.016060831714296262}
[2024-06-03 16:24:27,384] task_evaluation.py(69) [INFO] - garden_semantic_segamentation_model scores: {'accuracy': 0.005019460125166828}
[2024-06-03 16:24:27,385] lifelong_learning.py(449) [INFO] - Task evaluation finishes.
[2024-06-03 16:24:27,386] lifelong_learning.py(452) [INFO] - upload kb index from index.pkl to /home/icyfeather/project/ianvs/workspace/lifelong_learning_bench/robot-workspace-test/benchmarkingjob/rfnet_lifelong_learning/803ef584-2182-11ef-a88f-e7cf327eae9a/output/eval/0/index.pkl
[2024-06-03 16:24:27,386] lifelong_learning.py(208) [INFO] - train from round 0
[2024-06-03 16:24:27,386] lifelong_learning.py(209) [INFO] - test round 5
[2024-06-03 16:24:27,386] lifelong_learning.py(210) [INFO] - all scores: {'accuracy': 0.008812804591714955}
[2024-06-03 16:24:27,386] lifelong_learning.py(220) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.016060831714296262}
[2024-06-03 16:24:27,386] lifelong_learning.py(220) [INFO] - garden_semantic_segamentation_model scores: {'accuracy': 0.005019460125166828}
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - all scores: [[{'accuracy': 0.011080875212785671}, {'accuracy': 0.014276169299325306}, {'accuracy': 0.0100682449118572}, {'accuracy': 0.009086123410847542}, {'accuracy': 0.008812804591714955}]]
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - front_semantic_segamentation_model scores: [[{'accuracy': 0.011184414527967647}, {'accuracy': 0.017297397865009875}, {'accuracy': 0.015014366393296938}, {'accuracy': 0.016351156425681856}, {'accuracy': 0.016060831714296262}]]
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - garden_semantic_segamentation_model scores: [[{'accuracy': 0.013040300380858405}, {'accuracy': 0.014589005626878117}, {'accuracy': 0.005392195699569733}, {'accuracy': 0.0033863937183303975}, {'accuracy': 0.005019460125166828}]]
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - task_avg scores: [[{'accuracy': 0.012112357454413025}, {'accuracy': 0.015943201745943998}, {'accuracy': 0.010203281046433334}, {'accuracy': 0.009868775072006127}, {'accuracy': 0.010540145919731545}]]
load model url:  /home/icyfeather/project/ianvs/workspace/lifelong_learning_bench/robot-workspace-test/benchmarkingjob/rfnet_lifelong_learning/803ef584-2182-11ef-a88f-e7cf327eae9a/output/train/0/seen_task/global.model
:   0%|                                                                                                                         | 0/1 [00:00<?, ?it/s][Save] save rfnet prediction:  /home/icyfeather/project/ianvs/workspace/lifelong_learning_bench/robot-workspace-test/benchmarkingjob/rfnet_lifelong_learning/803ef584-2182-11ef-a88f-e7cf327eae9a/output/inference/results/1/front/00000.png_origin.png
: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.81it/s]
Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 74, in run
    res, system_metric_info = paradigm.run()
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 186, in run
    inference_results, unseen_task_train_samples = self._inference(
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 334, in _inference
    res, is_unseen_task, _ = job.inference_2(data, **kwargs)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/sedna/core/lifelong_learning/lifelong_learning.py", line 597, in inference_2
    seen_samples, unseen_samples = unseen_sample_recognition(
TypeError: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 54, in run_testcases
    res, time = (testcase.run(workspace), utils.get_local_time())
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 79, in run
    raise RuntimeError(
RuntimeError: (paradigm=lifelonglearning) pipeline runs failed, error: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/icyfeather/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 93, in run
    succeed_testcases, test_results = self.testcase_controller.run_testcases(self.workspace)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 56, in run_testcases
    raise RuntimeError(f"testcase(id={testcase.id}) runs failed, error: {err}") from err
RuntimeError: testcase(id=803ef584-2182-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=803ef584-2182-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: 'NoneType' object is not callable.

Other env info: sedna==0.4.1

hsj576 commented 5 months ago

Thanks so much! delete "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" and it works.

btw, I don't know why there is "os.environ['CUDA_VISIBLE_DEVICES'] = '1'", is it necessary for some reason?

No reasons, I just forgot to delete it.

hsj576 commented 5 months ago

Another problem, when I run ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml:

...(many lines)
CPA:0.07153843458791581, mIoU:0.005019460125166828, fwIoU: 0.03151069938676472
Found 50 test RGB images
Found 50 test disparity images
:   0%|                                                                                                                        | 0/50 [00:00<?, ?it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:   8%|████████▉                                                                                                       | 4/50 [00:00<00:01, 39.04it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  16%|█████████████████▉                                                                                              | 8/50 [00:00<00:01, 39.21it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  24%|██████████████████████████▋                                                                                    | 12/50 [00:00<00:00, 39.34it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  32%|███████████████████████████████████▌                                                                           | 16/50 [00:00<00:00, 39.31it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  40%|████████████████████████████████████████████▍                                                                  | 20/50 [00:00<00:00, 39.38it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  48%|█████████████████████████████████████████████████████▎                                                         | 24/50 [00:00<00:00, 38.93it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  56%|██████████████████████████████████████████████████████████████▏                                                | 28/50 [00:00<00:00, 38.98it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  64%|███████████████████████████████████████████████████████████████████████                                        | 32/50 [00:00<00:00, 38.99it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  72%|███████████████████████████████████████████████████████████████████████████████▉                               | 36/50 [00:00<00:00, 39.05it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  80%|████████████████████████████████████████████████████████████████████████████████████████▊                      | 40/50 [00:01<00:00, 39.16it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  88%|█████████████████████████████████████████████████████████████████████████████████████████████████▋             | 44/50 [00:01<00:00, 39.20it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
:  96%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▌    | 48/50 [00:01<00:00, 39.18it/s](1, 480, 640) (1, 480, 640)
(1, 480, 640) (1, 480, 640)
: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 39.13it/s]
-----------Acc of each classes-----------
road         : 81.655560 %
sidewalk     : 0.000000 %
building     : 5.043014 %
wall         : 0.000000 %
fence        : 0.000000 %
pole         : 0.000000 %
traffic light: nan %
traffic sign : nan %
vegetation   : 0.000000 %
terrain      : 0.000000 %
sky          : 0.000000 %
person       : 0.000000 %
rider        : nan %
car          : 0.000000 %
truck        : nan %
bus          : nan %
train        : nan %
motorcycle   : nan %
bicycle      : nan %
stair        : 0.003421 %
curb         : 0.000000 %
ramp         : nan %
runway       : nan %
flowerbed    : 0.000000 %
door         : 0.000000 %
CCTV camera  : 0.000000 %
Manhole      : nan %
hydrant      : nan %
belt         : nan %
dustbin      : nan %
-----------IoU of each classes-----------
road         : 20.728859 %
sidewalk     : 0.000000 %
building     : 4.824876 %
wall         : 0.000000 %
fence        : 0.000000 %
pole         : 0.000000 %
traffic light: 0.000000 %
traffic sign : nan %
vegetation   : 0.000000 %
terrain      : 0.000000 %
sky          : 0.000000 %
person       : 0.000000 %
rider        : 0.000000 %
car          : 0.000000 %
truck        : 0.000000 %
bus          : 0.000000 %
train        : 0.000000 %
motorcycle   : 0.000000 %
bicycle      : 0.000000 %
stair        : 0.003399 %
curb         : 0.000000 %
ramp         : 0.000000 %
runway       : 0.000000 %
flowerbed    : 0.000000 %
door         : 0.000000 %
CCTV camera  : 0.000000 %
Manhole      : 0.000000 %
hydrant      : 0.000000 %
belt         : 0.000000 %
dustbin      : 0.000000 %
-----------FWIoU of each classes-----------
road         : 4.382773 %
sidewalk     : 0.000000 %
-----------freq of each classes-----------
road         : 21.143340 %
sidewalk     : 16.498009 %
building     : 34.396249 %
wall         : 0.519759 %
fence        : 0.032960 %
pole         : 0.924427 %
traffic light: 0.000000 %
traffic sign : 0.000000 %
vegetation   : 15.705082 %
terrain      : 0.970381 %
sky          : 4.972848 %
person       : 0.000989 %
rider        : 0.000000 %
car          : 1.175906 %
truck        : 0.000000 %
bus          : 0.000000 %
train        : 0.000000 %
motorcycle   : 0.000000 %
bicycle      : 0.000000 %
stair        : 1.532379 %
curb         : 1.699381 %
ramp         : 0.000000 %
runway       : 0.000000 %
flowerbed    : 0.143163 %
door         : 0.283993 %
CCTV camera  : 0.001134 %
Manhole      : 0.000000 %
hydrant      : 0.000000 %
belt         : 0.000000 %
dustbin      : 0.000000 %
CPA:0.05418874686914789, mIoU:0.008812804591714955, fwIoU: 0.060424015451444865
[2024-06-03 16:24:27,384] task_evaluation.py(69) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.016060831714296262}
[2024-06-03 16:24:27,384] task_evaluation.py(69) [INFO] - garden_semantic_segamentation_model scores: {'accuracy': 0.005019460125166828}
[2024-06-03 16:24:27,385] lifelong_learning.py(449) [INFO] - Task evaluation finishes.
[2024-06-03 16:24:27,386] lifelong_learning.py(452) [INFO] - upload kb index from index.pkl to /home/icyfeather/project/ianvs/workspace/lifelong_learning_bench/robot-workspace-test/benchmarkingjob/rfnet_lifelong_learning/803ef584-2182-11ef-a88f-e7cf327eae9a/output/eval/0/index.pkl
[2024-06-03 16:24:27,386] lifelong_learning.py(208) [INFO] - train from round 0
[2024-06-03 16:24:27,386] lifelong_learning.py(209) [INFO] - test round 5
[2024-06-03 16:24:27,386] lifelong_learning.py(210) [INFO] - all scores: {'accuracy': 0.008812804591714955}
[2024-06-03 16:24:27,386] lifelong_learning.py(220) [INFO] - front_semantic_segamentation_model scores: {'accuracy': 0.016060831714296262}
[2024-06-03 16:24:27,386] lifelong_learning.py(220) [INFO] - garden_semantic_segamentation_model scores: {'accuracy': 0.005019460125166828}
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - all scores: [[{'accuracy': 0.011080875212785671}, {'accuracy': 0.014276169299325306}, {'accuracy': 0.0100682449118572}, {'accuracy': 0.009086123410847542}, {'accuracy': 0.008812804591714955}]]
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - front_semantic_segamentation_model scores: [[{'accuracy': 0.011184414527967647}, {'accuracy': 0.017297397865009875}, {'accuracy': 0.015014366393296938}, {'accuracy': 0.016351156425681856}, {'accuracy': 0.016060831714296262}]]
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - garden_semantic_segamentation_model scores: [[{'accuracy': 0.013040300380858405}, {'accuracy': 0.014589005626878117}, {'accuracy': 0.005392195699569733}, {'accuracy': 0.0033863937183303975}, {'accuracy': 0.005019460125166828}]]
[2024-06-03 16:24:27,386] lifelong_learning.py(234) [INFO] - task_avg scores: [[{'accuracy': 0.012112357454413025}, {'accuracy': 0.015943201745943998}, {'accuracy': 0.010203281046433334}, {'accuracy': 0.009868775072006127}, {'accuracy': 0.010540145919731545}]]
load model url:  /home/icyfeather/project/ianvs/workspace/lifelong_learning_bench/robot-workspace-test/benchmarkingjob/rfnet_lifelong_learning/803ef584-2182-11ef-a88f-e7cf327eae9a/output/train/0/seen_task/global.model
:   0%|                                                                                                                         | 0/1 [00:00<?, ?it/s][Save] save rfnet prediction:  /home/icyfeather/project/ianvs/workspace/lifelong_learning_bench/robot-workspace-test/benchmarkingjob/rfnet_lifelong_learning/803ef584-2182-11ef-a88f-e7cf327eae9a/output/inference/results/1/front/00000.png_origin.png
: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.81it/s]
Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 74, in run
    res, system_metric_info = paradigm.run()
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 186, in run
    inference_results, unseen_task_train_samples = self._inference(
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py", line 334, in _inference
    res, is_unseen_task, _ = job.inference_2(data, **kwargs)
  File "/home/icyfeather/miniconda3/envs/ianvs/lib/python3.9/site-packages/sedna/core/lifelong_learning/lifelong_learning.py", line 597, in inference_2
    seen_samples, unseen_samples = unseen_sample_recognition(
TypeError: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 54, in run_testcases
    res, time = (testcase.run(workspace), utils.get_local_time())
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcase/testcase.py", line 79, in run
    raise RuntimeError(
RuntimeError: (paradigm=lifelonglearning) pipeline runs failed, error: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/icyfeather/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 93, in run
    succeed_testcases, test_results = self.testcase_controller.run_testcases(self.workspace)
  File "/home/icyfeather/project/ianvs/core/testcasecontroller/testcasecontroller.py", line 56, in run_testcases
    raise RuntimeError(f"testcase(id={testcase.id}) runs failed, error: {err}") from err
RuntimeError: testcase(id=803ef584-2182-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: testcase(id=803ef584-2182-11ef-a88f-e7cf327eae9a) runs failed, error: (paradigm=lifelonglearning) pipeline runs failed, error: 'NoneType' object is not callable.

Other env info: sedna==0.4.1

change the mode to "no-inference".

Screenshot 2024-06-03 173400

IcyFeather233 commented 5 months ago

Because there is all_df.index = pd.np.arange(1, len(all_df) + 1) in https://github.com/kubeedge/ianvs/blob/main/core/storymanager/rank/rank.py#L178 and https://github.com/kubeedge/ianvs/blob/main/core/storymanager/rank/rank.py#L208, and pd.np is deprecated since pandas 2.0.0, so I run pip install pandas==1.5.3

and then rerun ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml

This time the training succeed, however the visualization is wrong:

garden_semantic_segamentation_model BWT_score: 0.05753041692524988
garden_semantic_segamentation_model FWT_score: 0.11415881365625927
compute function: key=task_avg, matrix=[[{'accuracy': 0.0031747766166595}, {'accuracy': 0.005443421620052942}, {'accuracy': 0.0027611203296307976}, {'accuracy': 0.005215614561737811}, {'accuracy': 0.00413587684881005}], [{'accuracy': 0.057318073389181615}, {'accuracy': 0.06558768795293932}, {'accuracy': 0.04695469448853577}, {'accuracy': 0.08797596039038008}, {'accuracy': 0.08631227347076771}], [{'accuracy': 0.11823181064491628}, {'accuracy': 0.16367581346348137}, {'accuracy': 0.12000465227436473}, {'accuracy': 0.11352067253051591}, {'accuracy': 0.11416052765142247}], [{'accuracy': 0.1414002714021555}, {'accuracy': 0.22293507242494295}, {'accuracy': 0.1827968071969279}, {'accuracy': 0.14674554475958146}, {'accuracy': 0.1464861637270747}], [{'accuracy': 0.146328014414279}, {'accuracy': 0.2257937760084689}, {'accuracy': 0.22123291275411555}, {'accuracy': 0.26335538724421365}, {'accuracy': 0.24032252447865676}], [{'accuracy': 0.18097321138867067}, {'accuracy': 0.23763178846845756}, {'accuracy': 0.22446577773244217}, {'accuracy': 0.2991808964700411}, {'accuracy': 0.27705676675558427}]], type(matrix)=<class 'list'>
task_avg BWT_score: 0.04794310523353218
task_avg FWT_score: 0.11102087522106215
/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py:171: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  all_df = all_df.append(old_df)
/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py:179: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.
  all_df.index = pd.np.arange(1, len(all_df) + 1)
/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py:209: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.
  selected_df.index = pd.np.arange(1, len(selected_df) + 1)
Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/icyfeather/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 96, in run
    self.rank.save(succeed_testcases, test_results, output_dir=self.workspace)
  File "/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py", line 263, in save
    self._draw_pictures(test_cases, test_results)
  File "/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py", line 219, in _draw_pictures
    for key in matrix.keys():
AttributeError: 'NoneType' object has no attribute 'keys'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: 'NoneType' object has no attribute 'keys'.
hsj576 commented 5 months ago

Because there is all_df.index = pd.np.arange(1, len(all_df) + 1) in https://github.com/kubeedge/ianvs/blob/main/core/storymanager/rank/rank.py#L178 and https://github.com/kubeedge/ianvs/blob/main/core/storymanager/rank/rank.py#L208, and pd.np is deprecated since pandas 2.0.0, so I run pip install pandas==1.5.3

and then rerun ianvs -f examples/robot/lifelong_learning_bench/semantic-segmentation/benchmarkingjob-simple.yaml

This time the training succeed, however the visualization is wrong:

garden_semantic_segamentation_model BWT_score: 0.05753041692524988
garden_semantic_segamentation_model FWT_score: 0.11415881365625927
compute function: key=task_avg, matrix=[[{'accuracy': 0.0031747766166595}, {'accuracy': 0.005443421620052942}, {'accuracy': 0.0027611203296307976}, {'accuracy': 0.005215614561737811}, {'accuracy': 0.00413587684881005}], [{'accuracy': 0.057318073389181615}, {'accuracy': 0.06558768795293932}, {'accuracy': 0.04695469448853577}, {'accuracy': 0.08797596039038008}, {'accuracy': 0.08631227347076771}], [{'accuracy': 0.11823181064491628}, {'accuracy': 0.16367581346348137}, {'accuracy': 0.12000465227436473}, {'accuracy': 0.11352067253051591}, {'accuracy': 0.11416052765142247}], [{'accuracy': 0.1414002714021555}, {'accuracy': 0.22293507242494295}, {'accuracy': 0.1827968071969279}, {'accuracy': 0.14674554475958146}, {'accuracy': 0.1464861637270747}], [{'accuracy': 0.146328014414279}, {'accuracy': 0.2257937760084689}, {'accuracy': 0.22123291275411555}, {'accuracy': 0.26335538724421365}, {'accuracy': 0.24032252447865676}], [{'accuracy': 0.18097321138867067}, {'accuracy': 0.23763178846845756}, {'accuracy': 0.22446577773244217}, {'accuracy': 0.2991808964700411}, {'accuracy': 0.27705676675558427}]], type(matrix)=<class 'list'>
task_avg BWT_score: 0.04794310523353218
task_avg FWT_score: 0.11102087522106215
/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py:171: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  all_df = all_df.append(old_df)
/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py:179: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.
  all_df.index = pd.np.arange(1, len(all_df) + 1)
/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py:209: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.
  selected_df.index = pd.np.arange(1, len(selected_df) + 1)
Traceback (most recent call last):
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 37, in main
    job.run()
  File "/home/icyfeather/project/ianvs/core/cmd/obj/benchmarkingjob.py", line 96, in run
    self.rank.save(succeed_testcases, test_results, output_dir=self.workspace)
  File "/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py", line 263, in save
    self._draw_pictures(test_cases, test_results)
  File "/home/icyfeather/project/ianvs/core/storymanager/rank/rank.py", line 219, in _draw_pictures
    for key in matrix.keys():
AttributeError: 'NoneType' object has no attribute 'keys'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/icyfeather/miniconda3/envs/ianvs/bin/ianvs", line 33, in <module>
    sys.exit(load_entry_point('ianvs==0.1.0', 'console_scripts', 'ianvs')())
  File "/home/icyfeather/project/ianvs/core/cmd/benchmarking.py", line 41, in main
    raise RuntimeError(f"benchmarkingjob runs failed, error: {err}.") from err
RuntimeError: benchmarkingjob runs failed, error: 'NoneType' object has no attribute 'keys'.

The visualization part may have some bugs. You could use the "selected_and_all" mode.

Screenshot 2024-06-04 143350

IcyFeather233 commented 5 months ago

Yes it works!

However after the rank result, the program just stuck, I wonder why.

+------+--------------------+---------------------+--------------------+---------------------+---------------------+----------+-----------+-----------------+-----------------+-------------------------+------------------+-------------------------+-------------------------+------+-----+
| rank |     algorithm      |       accuracy      |    task_avg_acc    |         BWT         |         FWT         | paradigm | basemodel | task_definition | task_allocation | basemodel-learning_rate | basemodel-epochs | task_definition-origins | task_allocation-origins | time | url |
+------+--------------------+---------------------+--------------------+---------------------+---------------------+----------+-----------+-----------------+-----------------+-------------------------+------------------+-------------------------+-------------------------+------+-----+
|  1   |                    | 0.16341213386768544 | 0.1823223652551931 | 0.04785572989305128 | 0.09202275672186552 |          |           |                 |                 |                         |                  |                         |                         |      |     |
|  2   |        0.0         |  0.1630332304817737 | 0.0444330578799653 |  0.1057784033217158 |                     |          |           |                 |                 |                         |                  |                         |                         |      |     |
|  3   | 0.0436096102184875 |  0.1029848098592248 | 0.1632288012257678 |         0.0         |                     |          |           |                 |                 |                         |                  |                         |                         |      |     |
+------+--------------------+---------------------+--------------------+---------------------+---------------------+----------+-----------+-----------------+-----------------+-------------------------+------------------+-------------------------+-------------------------+------+-----+
[2024-06-04 18:13:50,011] benchmarking.py(39) [INFO] - benchmarkingjob runs successfully.

(stuck here)
hsj576 commented 5 months ago

Yes it works!

However after the rank result, the program just stuck, I wonder why.

+------+--------------------+---------------------+--------------------+---------------------+---------------------+----------+-----------+-----------------+-----------------+-------------------------+------------------+-------------------------+-------------------------+------+-----+
| rank |     algorithm      |       accuracy      |    task_avg_acc    |         BWT         |         FWT         | paradigm | basemodel | task_definition | task_allocation | basemodel-learning_rate | basemodel-epochs | task_definition-origins | task_allocation-origins | time | url |
+------+--------------------+---------------------+--------------------+---------------------+---------------------+----------+-----------+-----------------+-----------------+-------------------------+------------------+-------------------------+-------------------------+------+-----+
|  1   |                    | 0.16341213386768544 | 0.1823223652551931 | 0.04785572989305128 | 0.09202275672186552 |          |           |                 |                 |                         |                  |                         |                         |      |     |
|  2   |        0.0         |  0.1630332304817737 | 0.0444330578799653 |  0.1057784033217158 |                     |          |           |                 |                 |                         |                  |                         |                         |      |     |
|  3   | 0.0436096102184875 |  0.1029848098592248 | 0.1632288012257678 |         0.0         |                     |          |           |                 |                 |                         |                  |                         |                         |      |     |
+------+--------------------+---------------------+--------------------+---------------------+---------------------+----------+-----------+-----------------+-----------------+-------------------------+------------------+-------------------------+-------------------------+------+-----+
[2024-06-04 18:13:50,011] benchmarking.py(39) [INFO] - benchmarkingjob runs successfully.

(stuck here)

This is normal, so just ctrl c to exit.

IcyFeather233 commented 5 months ago

I have successfully go through the semantic-segementation lifelong learning example.

Share my environment here:

OS and CUDA:

Cuda 11.8
ubuntu 20.04

pip list:

Package                   Version      Editable project location
------------------------- ------------ -----------------------------------------
absl-py                   2.1.0
addict                    2.4.0
asgiref                   3.8.1
asttokens                 2.4.1
attrs                     23.2.0
backcall                  0.2.0
beautifulsoup4            4.12.3
bleach                    6.1.0
certifi                   2024.6.2
charset-normalizer        3.3.2
click                     8.1.7
cmake                     3.25.0
colorlog                  4.7.2
contourpy                 1.2.1
cycler                    0.12.1
decorator                 5.1.1
defusedxml                0.7.1
docopt                    0.6.2
executing                 2.0.1
fastapi                   0.68.2
fastjsonschema            2.19.1
filelock                  3.13.1
fonttools                 4.53.0
fsspec                    2024.5.0
grpcio                    1.64.0
h11                       0.14.0
huggingface-hub           0.23.2
ianvs                     0.1.0
idna                      3.7
importlib_metadata        7.1.0
importlib_resources       6.4.0
ipython                   8.12.3
jedi                      0.19.1
Jinja2                    3.1.3
joblib                    1.2.0
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.2
jupyter_core              5.7.2
jupyterlab_pygments       0.3.0
kiwisolver                1.4.5
lit                       15.0.7
Markdown                  3.6
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.9.0
matplotlib-inline         0.1.7
mdurl                     0.1.2
minio                     7.0.4
mistune                   3.0.2
mmcv                      2.0.1
mmdet                     3.1.0        /home/icyfeather/project/mmdetection
mmengine                  0.10.4
mpmath                    1.3.0
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
networkx                  3.2.1
numpy                     1.26.4
opencv-python             4.9.0.80
packaging                 24.0
pandas                    1.5.3
pandocfilters             1.5.1
parso                     0.8.4
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    10.3.0
pip                       24.0
pipreqs                   0.5.0
platformdirs              4.2.2
prettytable               2.5.0
prompt_toolkit            3.0.45
protobuf                  5.27.0
ptyprocess                0.7.0
pure-eval                 0.2.2
pycocotools               2.0.7
pydantic                  1.10.15
Pygments                  2.18.0
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     26.0.3
referencing               0.35.1
regex                     2024.5.15
requests                  2.32.3
rich                      13.7.1
rpds-py                   0.18.1
safetensors               0.4.3
scikit-learn              1.5.0
scipy                     1.13.1
sedna                     0.4.1
segment-anything          1.0          /home/icyfeather/project/segment-anything
setuptools                54.2.0
shapely                   2.0.4
six                       1.15.0
soupsieve                 2.5
stack-data                0.6.3
starlette                 0.14.2
sympy                     1.12
tenacity                  8.0.1
tensorboard               2.16.2
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminaltables            3.1.10
threadpoolctl             3.5.0
tinycss2                  1.3.0
tokenizers                0.19.1
tomli                     2.0.1
torch                     2.0.1+cu118
torchaudio                2.0.2+cu118
torchvision               0.15.2+cu118
tornado                   6.4
tqdm                      4.66.4
traitlets                 5.14.3
transformers              4.41.2
triton                    2.0.0
typing_extensions         4.12.1
tzdata                    2024.1
urllib3                   2.2.1
uvicorn                   0.14.0
wcwidth                   0.2.13
webencodings              0.5.1
websockets                9.1
Werkzeug                  3.0.3
wheel                     0.43.0
yapf                      0.40.2
yarg                      0.1.9
zipp                      3.19.1