Closed zhxngjxnhxx closed 2 years ago
In the fabric dataset, there are 15+1 categories (15 fabric defects and 1 background), so the last FC layer of each cascade head has a 16 dimension weight.
In the MS-COCO dataset, there are 80+1 categories, so the dimension of weight is 81.
In training, my code will automatically ignore the parameters that have the mismatch in the dimensions when it loads the concatenated pre-trained weight in the model. So, If the version of your python packages (like pytorch, mmcv etc ) is the same as mine, this problem will not occur. The checkpoint '.pth' file is a standard pytorch weight file, if you want to keep your existing packages, you can change the dimensions of the parameters (bbox_head.0.fc_cls.weight, bbox_head.1.fc_cls.weight, bbox_head.2.fc_cls.weight or others be mentioned in the error log) to the target dimension and save a new weight file to load in.
@zhengye1995 thx first But here is my situation: In fact, at first I did follow the steps in your readme file and did not modify any files,
But sometimes even the setup.py cannot run normally I noticed that in your command: conda install pytorch=1.1.0 torchvision=0.3.0 cudatoolkit=10.0 -c pytorch conflicts with this sentence pip install cython && pip --no-cache-dir install -r requirements.txt
In the requirements file Those version numbers are preceded by >= Sometimes will install the latest version of the package and sometimes the latest pytorch or mmcv will be installed and the version will not match, which will eventually lead to running errors. Have you ever conducted a version compatibility test? In my previous attempts, I only limited the version number of mmcv in the requirements, and in order to avoid repeated installation, I deleted torch>=1.1 and torchvision. Like this: and this is my dist_train.sh settings This is the setting of my dist_train.sh, where the commented out part is done locally, to reduce repeated operations when uploading to the server. so if i dont limit the version number of the packages,match error,if i limit the version number of the packages,the error above.
I only have 1 GPU, is this setting correct?
I only have 1 GPU, is this setting correct?
Yes, this setting is correct.
@zhengye1995 thx first But here is my situation: In fact, at first I did follow the steps in your readme file and did not modify any files,
But sometimes even the setup.py cannot run normally I noticed that in your command: conda install pytorch=1.1.0 torchvision=0.3.0 cudatoolkit=10.0 -c pytorch conflicts with this sentence pip install cython && pip --no-cache-dir install -r requirements.txt
In the requirements file Those version numbers are preceded by >= Sometimes will install the latest version of the package and sometimes the latest pytorch or mmcv will be installed and the version will not match, which will eventually lead to running errors. Have you ever conducted a version compatibility test? In my previous attempts, I only limited the version number of mmcv in the requirements, and in order to avoid repeated installation, I deleted torch>=1.1 and torchvision. Like this: and this is my dist_train.sh settings This is the setting of my dist_train.sh, where the commented out part is done locally, to reduce repeated operations when uploading to the server. so if i dont limit the version number of the packages,match error,if i limit the version number of the packages,the error above.
This is the version of my python packages: pytorch==1.1.0 mmcv==0.2.14 mmdet==1.0rc0 (after build)
I solved this problem by specifying mmcv==0.2.14
in requirements.txt
.
I solved this problem by specifying
mmcv==0.2.14
inrequirements.txt
.
Congratulations!
When I run train.sh the following error occurs
Especially this sentence:RuntimeError: While copying the parameter named bbox_head.0.fc_cls.weight, whose dimensions in the model are torch.Size([16, 1024]) and whose dimensions in the checkpoint are torch.Size([81, 1024]). Point out that the error is that the dimensions do not match each other, but the weights are downloaded according to the URL in train.sh ,How can I solve this error?