thuyngch / Human-Segmentation-PyTorch

Human segmentation models, training/inference code, and trained weights, implemented in PyTorch
557 stars 114 forks source link

Training from scratch fails #21

Open b-hakim opened 4 years ago

b-hakim commented 4 years ago

I am having this issue when running training from scratch

Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(config, args.resume)
  File "train.py", line 55, in main
    trainer.train()
  File "/path/Human-Segmentation-PyTorch/base/base_trainer.py", line 95, in train
    result = self._train_epoch(epoch)
  File "/path/Human-Segmentation-PyTorch/trainer/trainer.py", line 81, in _train_epoch
    loss = self.loss(output, target)
  File "/path/Human-Segmentation-PyTorch/evaluation/losses.py", line 18, in dice_loss
    targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.tensor(1.0))
RuntimeError: Index tensor must have the same number of dimensions as src tensor

I guess I might have an issue in the dataset labeling. What is the correct format? I used an image of the same size for the original image and have 1 channel (I have 2 classes), so that is a mask of 1 channel containing 0 and 1 (or 0 and 255)

FrankChengGD commented 3 years ago

I changed that line to

targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.ones_like(logits))

in order to make the index and src in same size. It seems work for me.

I am having this issue when running training from scratch

Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(config, args.resume)
  File "train.py", line 55, in main
    trainer.train()
  File "/path/Human-Segmentation-PyTorch/base/base_trainer.py", line 95, in train
    result = self._train_epoch(epoch)
  File "/path/Human-Segmentation-PyTorch/trainer/trainer.py", line 81, in _train_epoch
    loss = self.loss(output, target)
  File "/path/Human-Segmentation-PyTorch/evaluation/losses.py", line 18, in dice_loss
    targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.tensor(1.0))
RuntimeError: Index tensor must have the same number of dimensions as src tensor

I guess I might have an issue in the dataset labeling. What is the correct format? I used an image of the same size for the original image and have 1 channel (I have 2 classes), so that is a mask of 1 channel containing 0 and 1 (or 0 and 255)

HettieLi commented 2 years ago

I am having this issue when running training from scratch

Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(config, args.resume)
  File "train.py", line 55, in main
    trainer.train()
  File "/path/Human-Segmentation-PyTorch/base/base_trainer.py", line 95, in train
    result = self._train_epoch(epoch)
  File "/path/Human-Segmentation-PyTorch/trainer/trainer.py", line 81, in _train_epoch
    loss = self.loss(output, target)
  File "/path/Human-Segmentation-PyTorch/evaluation/losses.py", line 18, in dice_loss
    targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.tensor(1.0))
RuntimeError: Index tensor must have the same number of dimensions as src tensor

I guess I might have an issue in the dataset labeling. What is the correct format? I used an image of the same size for the original image and have 1 channel (I have 2 classes), so that is a mask of 1 channel containing 0 and 1 (or 0 and 255)

I have the same problem as you. Try to use command 'pip install torch==1.2.0' and 'torchvision==0.4.0' to solve it.

XXMxxm220 commented 1 year ago

type(torch.int64)

你好,我尝试更改为你的代码,但报错 inter = (outputs & targets).type(torch.float32).sum(dim=(2,3)) RuntimeError: "bitwise_and_cuda" not implemented for 'Float'