techmn / elgcnet

ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection
Apache License 2.0
22 stars 2 forks source link

An index dimension mismatch problem was encountered #2

Closed helloyua closed 5 months ago

helloyua commented 5 months ago

Thank you for your outstanding work However, I have encountered a problem that I would like to ask you, which is that when writing the confusion matrix in the first verification step, the error shows that the index does not match.

The following is the error message: root@autodl-container-138c41aa1e-268199cb:~/elgcnet# python main_cd.py True [0] cuda:0 ================ (Wed May 8 21:12:16 2024) ================ gpu_ids: [0] project_name: elgcnet_levir checkpoint_root: ./checkpoints vis_root: ./vis num_workers: 8 dataset: CDDataset data_name: LEVIR batch_size: 32 split: train split_val: val img_size: 256 n_class: 2 dec_embed_dim: 256 pretrain: None net_G: ELGCNet loss: ce optimizer: adamw lr: 0.00031 max_epochs: 300 lr_policy: linear lr_decay_iters: [100] checkpoint_dir: ./checkpoints/elgcnet_levir vis_dir: ./vis/elgcnet_levir

training from scratch...

lr: 0.0003100

0%| | 0/14 [00:00<?, ?it/s]/root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( 7%|████████████▊ | 1/14 [00:12<02:38, 12.23s/it]Is_training: True. [0,299][1,14], imps: 138.45, est: 8.62h, G_loss: 0.65767, running_mf1: 0.47853 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:37<00:00, 2.66s/it] Is_training: True. Epoch 0 / 299, epoch_mF1= 0.50163 acc: 0.85369 miou: 0.44794 mf1: 0.50163 iou_0: 0.85272 iou_1: 0.04317 F1_0: 0.92051 F1_1: 0.08276 precision_0: 0.95535 precision_1: 0.05825 recall_0: 0.88812 recall_1: 0.14289

Begin evaluation... /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py:404: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Traceback (most recent call last): File "main_cd.py", line 72, in train(args) File "main_cd.py", line 11, in train model.train_models() File "/root/elgcnet/models/trainer.py", line 332, in train_models self._collect_running_batch_states() File "/root/elgcnet/models/trainer.py", line 203, in _collect_running_batch_states running_acc = self._update_metric() File "/root/elgcnet/models/trainer.py", line 198, in _update_metric current_score = self.running_metric.update_cm(pr=G_pred.cpu().numpy(), gt=target.cpu().numpy()) File "/root/elgcnet/misc/metric_tool.py", line 55, in update_cm val = get_confuse_matrix(num_classes=self.n_class, label_gts=gt, label_preds=pr) File "/root/elgcnet/misc/metric_tool.py", line 155, in get_confuse_matrix confusion_matrix += fast_hist(lt.flatten(), lp.flatten()) File "/root/elgcnet/misc/metric_tool.py", line 150, in fast_hist hist = np.bincount(num_classes * label_gt[mask].astype(int) + label_pred[mask], IndexError: boolean index did not match indexed array along dimension 0; dimension is 65536 but corresponding boolean dimension is 1048576

There is another warning message: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( I sincerely hope to receive your reply

techmn commented 5 months ago

Hi, As training step is completed successfully and its scores are computed without error, so it seems that there is no issue with the code. Instead check with the dataset images, especially check that ground truth labels are single channel. If your ground truth labels are 3 channel images then you can add [:,:,0] at the end of line 118 in CD_dataset.py. Hope it helps.

helloyua commented 5 months ago

Thanks for the reply and clear explanation. But it doesn't seem to be the problem with the dataset, I'm using the LEVIR dataset, which labels images as 1024 × 1024 single-channel grayscale images,After I added [:,:,0], it got the following error

0%| | 0/56 [00:00<?, ?it/s]True 0%| | 0/56 [00:00<?, ?it/s] ...... ...... ...... File "G:\elgcnet-main\elgcnet-main\datasets\CD_dataset.py", line 118, in getitem label = np.array(Image.open(L_path), dtype=np.uint8)[:,:,0] IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

techmn commented 5 months ago

Hi,

We used the pre-processed LEVIR-CD having image size of 256x256. You can either apply non-overlapped cropping on 1024x1024 sized images to get 256x256 sized images OR change the image size in main_cd.py file to 1024.

helloyua commented 5 months ago

express my gratitude for your help,it works!

helloyua commented 5 months ago

I really appreciate your help, However, at runtime, it seems that a V100 graphics card can hardly meet the overhead of the network. Is your device multi-card parallel? In addition, is the lightweight version mentioned in your paper also included in this project? How to use and train it? Thanks again for your patience,

techmn commented 5 months ago

I will recommend you to use 256x256 image size. You can do the non-overlap cropping of 1024x1024 sized image to obtain 16 sub-images of size 256x256. Otherwise, you can try with reduced batch size, and increase the kernel size and stride of pooling layers in attention. Additionally, you can use the decoder with less parameters from the code below:

class LightHeadDecoder(nn.Module):
    """
    Transformer Decoder
    """
    def __init__(self, in_channels = [32, 64, 128, 256], embedding_dim=64, output_nc=2, align_corners=True):
        super(LightHeadDecoder, self).__init__()

        #settings
        self.align_corners   = align_corners
        self.in_channels     = in_channels
        self.embedding_dim   = embedding_dim
        self.output_nc       = output_nc
        c1_in_channels, c2_in_channels, c3_in_channels, c4_in_channels = self.in_channels

        # Channel reduction of feature maps before merging
        self.linear_c4 = LinearProj(input_dim=c4_in_channels, embed_dim=self.embedding_dim)
        self.linear_c3 = LinearProj(input_dim=c3_in_channels, embed_dim=self.embedding_dim)
        self.linear_c2 = LinearProj(input_dim=c2_in_channels, embed_dim=self.embedding_dim)
        self.linear_c1 = LinearProj(input_dim=c1_in_channels, embed_dim=self.embedding_dim)

        # linear fusion layer to combine mult-scale features of all stages
        self.linear_fuse = nn.Sequential(
           nn.Conv2d(in_channels=self.embedding_dim*len(in_channels), out_channels=self.embedding_dim, kernel_size=1, padding=0, stride=1),
           nn.BatchNorm2d(self.embedding_dim)
        )

        self.diff_c1 = Fusion_Block(in_channels=self.embedding_dim)
        self.diff_c2 = Fusion_Block(in_channels=self.embedding_dim)
        self.diff_c3 = Fusion_Block(in_channels=self.embedding_dim)
        self.diff_c4 = Fusion_Block(in_channels=self.embedding_dim)

        #Final predction head
        self.dense_2x   = nn.Sequential(nn.Conv2d(in_channels=self.embedding_dim, out_channels=self.embedding_dim, kernel_size=3, padding=1, stride=1),
                                        nn.ReLU(),
                                        nn.BatchNorm2d(self.embedding_dim),
                                        nn.Conv2d(in_channels=self.embedding_dim, out_channels=self.embedding_dim, kernel_size=3, padding=1, stride=1, groups=self.embedding_dim),
                                        )

        self.dense_1x   = nn.Sequential(nn.Conv2d(in_channels=self.embedding_dim, out_channels=self.embedding_dim, kernel_size=3, padding=1, stride=1, groups=self.embedding_dim),
                                        nn.ReLU(),
                                        nn.BatchNorm2d(self.embedding_dim),
                                        nn.Conv2d(in_channels=self.embedding_dim, out_channels=self.embedding_dim, kernel_size=1, padding=0, stride=1)
                                        )
        self.change_probability = ConvLayer(self.embedding_dim, self.output_nc, kernel_size=3, stride=1, padding=1)

        #Final activation
        self.active             = nn.Sigmoid()

    def forward(self, inputs1, inputs2):
        #img1 and img2 features
        c1_1, c2_1, c3_1, c4_1 = inputs1        # len=4, 1/4, 1/8, 1/16, 1/32
        c1_2, c2_2, c3_2, c4_2 = inputs2        # len=4, 1/4, 1/8, 1/16, 1/32

        ############## MLP decoder on C1-C4 ###########
        n, _, h, w = c4_1.shape

        outputs = []
        # Stage 4: x1/32 scale
        _c4_1 = self.linear_c4(c4_1)
        _c4_2 = self.linear_c4(c4_2)
        _c4   = self.diff_c4([_c4_1, _c4_2])
        _c4_up= resize(_c4, size=c1_2.size()[2:], mode='bilinear', align_corners=False)

        # Stage 3: x1/16 scale
        _c3_1 = self.linear_c3(c3_1)
        _c3_2 = self.linear_c3(c3_2)
        _c3   = self.diff_c3([_c3_1, _c3_2])
        _c3_up= resize(_c3, size=c1_2.size()[2:], mode='bilinear', align_corners=False)

        # Stage 2: x1/8 scale
        _c2_1 = self.linear_c2(c2_1)
        _c2_2 = self.linear_c2(c2_2)
        _c2   = self.diff_c2([_c2_1, _c2_2])
        _c2_up= resize(_c2, size=c1_2.size()[2:], mode='bilinear', align_corners=False)

        # Stage 1: x1/4 scale
        _c1_1 = self.linear_c1(c1_1)
        _c1_2 = self.linear_c1(c1_2)
        _c1   = self.diff_c1([_c1_1, _c1_2])

        #Linear Fusion of difference image from all scales
        _c = self.linear_fuse(torch.cat([_c4_up, _c3_up, _c2_up, _c1],dim=1))

        #Upsampling x2 (x1/2 scale)
        x = F.interpolate(_c, scale_factor=2, mode="bilinear")
        #Residual block
        x = x + self.dense_2x(x)

        #Upsampling x2 (x1 scale)
        x = F.interpolate(x, scale_factor=2, mode="bilinear")
        #Residual block
        x = x + self.dense_1x(x)

        #Final prediction
        cp = self.change_probability(x)
        outputs.append(cp)
        return outputs
helloyua commented 5 months ago

You've been a great help,I appreciate!