justchenhao / STANet

official implementation of the spatial-temporal attention neural network (STANet) for remote sensing image change detection
BSD 2-Clause "Simplified" License
405 stars 110 forks source link

BAM and PAM training on LEVIR-CD Dataset Memory Consumption Issue #61

Closed loviji closed 3 years ago

loviji commented 3 years ago

Hello, I've read all issues before ask your advice in resolution this issue with me.

Firsltly thanks for sharing this work with people, I've learned many thinks from this project and continue learning.

I've downloaded pretraining weight of PAM from link attached on readme file for this project. Then run demo.py with my sample satellite images (1.5m in pixel). Result is good, but not excellent. Possible you trained your model with 0.5m in pixel, so firstly I've started to test train methods on LEVIR-CD Dataset with future planning with my own dataset. Base Method Learning done without problem. But BAM and PAM ask big memory(256GB).

So when I run code: python3 ./train.py --save_epoch_freq 1 --angle 15 --dataroot ../DATASET/train --val_dataroot ../DATASET/val --name LEVIR-CDFA_BAM2 --lr 0.001 --model CDFA --SA_mode BAM --batch_size 8 --load_size 256 --crop_size 256 --preprocess rotate_and_crop

image

As you see, I've 16GB Memory (Hosting provider AWS - with NVIDIA Tesla T4 VGA. So in another issues I've learn that, I can pass --ds 4 parameter - self attention module down-sample rate. It consumes 5GB memory, but as was expected demo results is not accurate. Your Pretrained model is better than my with --ds 4 parameter.

So my question is, how can I optimize training without downsample rate and work around memory consumption issue. As I've read from your paper

We tested our methods on a desktop PC equipped with an Intel i7-7700K CPU and an NVIDIA GTX 1080Ti graphic card. We used GPU to accelerate the training and testing process.

Possible you have some advises, to reach same model by quality as You shared via link? Possible PC params or Method parameters or input images parameters.

Thank you in advance

tan90du-sx commented 3 years ago

加一,我降低ds以后,效果很差。。还不如base

loviji commented 3 years ago

加一,我降低ds以后,效果很差。。还不如base

I've cropped LiverCD dataset into 16 parts, 256x256. CUDA out of memory error dissapeared. Working on another errors like:

  File "./train.py", line 169, in <module>
    miou_current = val(opt, model)
  File "./train.py", line 86, in val
    score = model.test(val=True)           # run inference
  File "/home/user/STANET/STANet/models/CDFA_model.py", line 79, in test
    metrics.update(self.L.detach().cpu().numpy(), pred.detach().cpu().numpy())
  File "/home/user/STANET/STANet/util/metrics.py", line 123, in update
    self.confusion_matrix += self.__fast_hist(lt.flatten(), lp.flatten())
  File "/home/user/STANET/STANet/util/metrics.py", line 110, in __fast_hist
    hist = np.bincount(self.num_classes * label_gt[mask].astype(int) + label_pred[mask],
IndexError: boolean index did not match indexed array along dimension 0; dimension is 65536 but corresponding boolean dimension is 196608
loviji commented 3 years ago

加一,我降低ds以后,效果很差。。还不如base

@tan90du-sx do you have sample code to run demo with Base model? 
tan90du-sx commented 3 years ago

base直接可以跑通,没有下采样。 我划分数据集为256-256,也跑不通

loviji commented 3 years ago

加一,我降低ds以后,效果很差。。还不如base

I've cropped LiverCD dataset into 16 parts, 256x256. CUDA out of memory error dissapeared. Working on another errors like:

  File "./train.py", line 169, in <module>
    miou_current = val(opt, model)
  File "./train.py", line 86, in val
    score = model.test(val=True)           # run inference
  File "/home/user/STANET/STANet/models/CDFA_model.py", line 79, in test
    metrics.update(self.L.detach().cpu().numpy(), pred.detach().cpu().numpy())
  File "/home/user/STANET/STANet/util/metrics.py", line 123, in update
    self.confusion_matrix += self.__fast_hist(lt.flatten(), lp.flatten())
  File "/home/user/STANET/STANet/util/metrics.py", line 110, in __fast_hist
    hist = np.bincount(self.num_classes * label_gt[mask].astype(int) + label_pred[mask],
IndexError: boolean index did not match indexed array along dimension 0; dimension is 65536 but corresponding boolean dimension is 196608

this error due to cropping process, label maps bit depth should be equal to 8.

johncse1 commented 3 years ago

Dear @loviji
I am getting the same IndexError. I too cropped the dataset by making each image of size 256x256 How can I make label maps bit depth to be equal to 8. Help me out.