sovit-123 / fasterrcnn-pytorch-training-pipeline

PyTorch Faster R-CNN Object Detection on Custom Dataset
MIT License
223 stars 75 forks source link

model save error #122

Open KavitaHoude opened 10 months ago

KavitaHoude commented 10 months ago

I am getting the following errors when trying to train the model on custom dataset. This error is getting at last epoch. Maybe it is model save error. Please give suggestions to solve these errors.

SAVING BEST MODEL FOR EPOCH: 10

Traceback (most recent call last): File "/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/train.py", line 571, in main(args) File "/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/train.py", line 566, in main wandb_save_model(OUT_DIR) File "/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/utils/logging.py", line 225, in wandb_save_model wandb.save(os.path.join(model_dir, 'best_model.pth')) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 371, in wrapper_fn return func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 361, in wrapper return func(self, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 1852, in save return self._save(glob_str, base_path, policy) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 1906, in _save os.symlink(abs_path, wandb_path) OSError: [Errno 95] Operation not supported: '/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/outputs/training/fasterrcnn_mobilenetv3_large_fpn_noaug_40e/best_model.pth' -> '/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/wandb/offline-run-20240110_114154-ui6uadqd/files/outputs/training/fasterrcnn_mobilenetv3_large_fpn_noaug_40e/best_model.pth' Traceback (most recent call last): File "/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/train.py", line 571, in main(args) File "/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/train.py", line 566, in main wandb_save_model(OUT_DIR) File "/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/utils/logging.py", line 225, in wandb_save_model wandb.save(os.path.join(model_dir, 'best_model.pth')) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 371, in wrapper_fn return func(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 361, in wrapper return func(self, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 1852, in save return self._save(glob_str, base_path, policy) File "/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_run.py", line 1906, in _save os.symlink(abs_path, wandb_path) OSError: [Errno 95] Operation not supported: '/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/outputs/training/fasterrcnn_mobilenetv3_large_fpn_noaug_40e/best_model.pth' -> '/content/drive/MyDrive/Tree_Detect_Faster_RCNN/fasterrcnn-pytorch-training-pipeline/wandb/offline-run-20240110_114154-ui6uadqd/files/outputs/training/fasterrcnn_mobilenetv3_large_fpn_noaug_40e/best_model.pth'

sovit-123 commented 10 months ago

Hello. Can you please provide the command that you are using?

KavitaHoude commented 10 months ago

hello Sir, this is the command !python train.py --model fasterrcnn_mobilenetv3_large_fpn --data data_configs/custom_data.yaml --epochs 10 --name fasterrcnn_mobilenetv3_large_fpn_noaug_40e --seed 42

On Wed, Jan 10, 2024 at 7:06 PM Sovit Ranjan Rath @.***> wrote:

Hello. Can you please provide the command that you are using?

— Reply to this email directly, view it on GitHub https://github.com/sovit-123/fasterrcnn-pytorch-training-pipeline/issues/122#issuecomment-1884865380, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5TGEHNRX6CYNX7H5F2B3VLYN2KOTAVCNFSM6AAAAABBUVAN3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUHA3DKMZYGA . You are receiving this because you authored the thread.Message ID: <sovit-123/fasterrcnn-pytorch-training-pipeline/issues/122/1884865380@ github.com>

sovit-123 commented 10 months ago

Okay. If you are training on Colab and trying save on Google Drive, please use the --project-dir argument instead of the --name argument for saving the project.

KavitaHoude commented 10 months ago

Okay. If you are training on Colab and trying save on Google Drive, please use the --project-dir argument instead of the --name argument for saving the project.

not worked. same error again

sovit-123 commented 10 months ago

Can you please let me know where the code files are? Is it getting cloned to colab or is it somewhere on the Google Drive? It may not work If it is on Google Drive.

KavitaHoude commented 10 months ago

Can you please let me know where the code files are? Is it getting cloned to colab or is it somewhere on the Google Drive? It may not work If it is on Google Drive.

its cloned to google drive by using the command !git clone https://github.com/sovit-123/fasterrcnn-pytorch-training-pipeline.git

sovit-123 commented 10 months ago

Most probably it won't run from Google Drive. Please try to clone to the colab drive directly and run it.