open-mmlab / mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark
https://mmpretrain.readthedocs.io/en/latest/
Apache License 2.0
3.49k stars 1.08k forks source link

[Bug] unable to run the verification code after installation. #1533

Open alaa-shubbak opened 1 year ago

alaa-shubbak commented 1 year ago

Branch

main branch (mmpretrain version)

Describe the bug

I install mmpretrain repository as mentioned in the doc.

i run this command to verify my installation.

python demo/image_demo.py demo/demo.JPEG resnet18_8xb32_in1k --device cpu unfortunately, i got this error message :

ModuleNotFoundError: No module named 'mmpretrain'

Environment

after running this command i got the following environment : `

{'sys.platform': 'linux', 'Python': '3.9.10 (main, Mar 4 2022, 13:58:45) [GCC 8.4.0]', 'CUDA available': True, 'numpy_random_seed': 2147483648, 'GPU 0': 'Tesla V100-SXM2-32GB', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105', 'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)', 'PyTorch': '1.10.0+cu111', 'TorchVision': '0.11.0+cu111', 'OpenCV': '4.6.0', 'MMEngine': '0.7.2', 'MMCV': '2.0.0', 'MMPreTrain': '1.0.0rc7+e80418a'}

`

Other information

no i only did what mentioned in the installation section.

I also download the imagenet dataset (train and metadata folders) .

i am not sure what would be the issue, perhaps it is related to mmcv version but not sure.

any help please.

Ezra-Yu commented 1 year ago

Can you successfully run import mmpretrain in the Python console?

alaa-shubbak commented 1 year ago

Can you successfully run import mmpretrain in the Python console?

i run it with also the print of mmpretrain version and i got this :

image

now when running the verification code bellow ` from mmpretrain import get_model, inference_model

img_path = 'demo/bird.JPEG'

model = get_model('resnet50_8xb32_in1k', pretrained=True, device="cuda:0") # device can be 'cuda:0'

result = inference_model(model, img_path) `

i got this error :

importlib_metadata.PackageNotFoundError: No package metadata was found for mmpretrain

alaa-shubbak commented 1 year ago

when i try to create new environment i got this error

IOError: [Errno 2] No such file or directory: '/tmp/pip-build-oW_net/tabulate/setup.py'

this happens after running the installation command of

pip install -U openmim && mim install -e .

alaa-shubbak commented 1 year ago

Thanks. I reinstall the environment again. now it is working with the verification following code. python demo/image_demo.py demo/demo.JPEG resnet18_8xb32_in1k --device cpu

I have another question, I train my model on my custom dataset by running the command:

python tools/train.py configs/densecl/densecl_resnet50_8xb32-coslr-200e_ACID.py --work-dir train_dir/densecl_resnet50_ACID/

which is depends on densecl model.

I notice the following issues :

1- there is no log file saved during training, so i can not know which is the optimal epoch for my training. 2- the system only saved three epoch pth files, for example it saved 100,110 and 120 , then for save 130 . they must delete the epoch values of 100. 3- i noticed that for each saved epoch there was only values of loss without accuracy.

how can i solve such issues, and have the log file to plot the train loss and accuracy after training? how can i save the values of accuracy for each saved epoch?

looking forward to hear from your side.

alaa-shubbak commented 1 year ago

Thank you . after reading more deep in the documentations of mmpretrain and mmengine. I think i got it and understand how to deal with the question number 2

but i am still confused about the other issues (questions)

Ezra-Yu commented 1 year ago

1- there is no log file saved during training, so i can not know which is the optimal epoch for my training.

use a smaller LoggerHook.interval https://github.com/open-mmlab/mmpretrain/blob/e80418a424aaefb81c95df458216bb3e9af246c4/configs/_base_/default_runtime.py#L10

3- i noticed that for each saved epoch there was only values of loss without accuracy.

Do you mean in training phase? let cal_acc=True refer to https://github.com/open-mmlab/mmpretrain/blob/e80418a424aaefb81c95df458216bb3e9af246c4/configs/_base_/models/resnest50.py#L22

For some selfsup algorithms, There is no ACC during the training.

alaa-shubbak commented 1 year ago

Thank you very much for your response.

use a smaller LoggerHook.interval

I used interval=10, as my dataset is small. but i could not find the log.json file in the work_dir after training, also i don't have any info how to plot the training accuracy and losses .

could you please help me with the function/command to use to plot the performance. something similar to this bellow in mmdetection : python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls

Do you mean in training phase? let cal_acc=True refer to

For some selfsup algorithms, There is no ACC during the training.

yes ,exactly . as i want to have a plot of the training performance of both accuracy and loss . but when i add this sentence in the head of my models ( both simclr and densecl) it gives me this error :

TypeError: classSimCLRin mmpretrain/models/selfsup/simclr.py: classContrastiveHeadin mmpretrain/models/heads/contrastive_head.py: __init__() got an unexpected keyword argument 'cal_acc'

and

TypeError: classDenseCLin mmpretrain/models/selfsup/densecl.py: classContrastiveHeadin mmpretrain/models/heads/contrastive_head.py: __init__() got an unexpected keyword argument 'cal_acc'