data preparation - Githubissues

rfww commented 2 years ago

can you release the codes of data preparation? Thanks!!!

wgcban commented 2 years ago

Hi @rfww

I have released the data preparation codes here https://github.com/wgcban/SemiCD/tree/main/dataset_preparation. This reads original LEVIR-CD and DSIFN-CD datasets and creates non-overlapping patches of 256x256 for training.

However, you don't need to do this. I have uploaded all the processed datasets into the DropBox and you can download exact splits from there. As I mentioned in the readme file,

The processed LEVIR-CD dataset, and supervised-unsupervised splits can be downloaded here.

The processed WHU-CD dataset, and supervised-unsupervised splits can be downloaded here.

Please let me know if you face any trouble when downloading the datasets.

Best, Chaminda.

rfww commented 2 years ago

Thanks a lot. i just want to make clear of the pseudo label generation. forgive me for not reading your paper carefully.😊😊😊

wgcban commented 2 years ago

Hi @rfww

No problem. Just for better clarity, our semi-supervised CD method does not require pseudo labels for the training process and it is relied on a consistency-based loss function to leverage the information from unlabeled data as shown in Fig. 3. That means, instead of using pseudo labels to compute unsupervised loss, we make change prediction maps of unlabeled data to be the same under the random perturbations applied to the feature difference map via MSE loss. I hope this clarifies your question. Feel free to comment here if you have more doubts.

Best Chaminda

Fig 3: method

rfww commented 2 years ago

ok, thanks for your explanation. 🙇‍🙇‍🙇‍

Yanll2021 commented 2 years ago

Hi @rfww

I have released the data preparation codes here https://github.com/wgcban/SemiCD/tree/main/dataset_preparation. This reads original LEVIR-CD and DSIFN-CD datasets and creates non-overlapping patches of 256x256 for training.

However, you don't need to do this. I have uploaded all the processed datasets into the DropBox and you can download exact splits from there. As I mentioned in the readme file,

The processed LEVIR-CD dataset, and supervised-unsupervised splits can be downloaded here.

The processed WHU-CD dataset, and supervised-unsupervised splits can be downloaded here.

Please let me know if you face any trouble when downloading the datasets.

Best, Chaminda.

your dataset WHU-CD seems to be missing more than 100 pieces, and your processed dataset are 5947/743/744 for tarin/val/test, but it should be 6096/762/762 for train/val/test. If it is convenient for you, could you check it again? Maybe I made a mistake in my calculation. Thank you！

wulei1595 commented 1 year ago

your dataset WHU-CD seems to be missing more than 100 pieces

Hi @rfww

I have released the data preparation codes here https://github.com/wgcban/SemiCD/tree/main/dataset_preparation. This reads original LEVIR-CD and DSIFN-CD datasets and creates non-overlapping patches of 256x256 for training.

However, you don't need to do this. I have uploaded all the processed datasets into the DropBox and you can download exact splits from there. As I mentioned in the readme file,

The processed LEVIR-CD dataset, and supervised-unsupervised splits can be downloaded here.

The processed WHU-CD dataset, and supervised-unsupervised splits can be downloaded here.

Please let me know if you face any trouble when downloading the datasets.

Best, Chaminda.

your dataset WHU-CD seems to be missing more than 100 pieces. The WHU-CD dataset of the original BIT is a total of 7620 images, but your WHU-CD dataset has only 7434 images.

wulei1595 commented 1 year ago

Hi @rfww I have released the data preparation codes here https://github.com/wgcban/SemiCD/tree/main/dataset_preparation. This reads original LEVIR-CD and DSIFN-CD datasets and creates non-overlapping patches of 256x256 for training. However, you don't need to do this. I have uploaded all the processed datasets into the DropBox and you can download exact splits from there. As I mentioned in the readme file, The processed LEVIR-CD dataset, and supervised-unsupervised splits can be downloaded here. The processed WHU-CD dataset, and supervised-unsupervised splits can be downloaded here. Please let me know if you face any trouble when downloading the datasets. Best, Chaminda.

your dataset WHU-CD seems to be missing more than 100 pieces, and your processed dataset are 5947/743/744 for tarin/val/test, but it should be 6096/762/762 for train/val/test. If it is convenient for you, could you check it again? Maybe I made a mistake in my calculation. Thank you！

Have you solved the problem of this lack of data sets

rfww commented 1 year ago

Hi @wulei1595 I remember not providing the WHU BCD dataset for you. Right? I cropped the image with the resolution of 32507×15354 to 256×256 patches without overlap. Then, we can get 7620 small patches. Details as below: training set: the number of changed image pairs: 1730 the number of unchanged image pairs: 5129 (The ground truth is all black.) testing set: the number of changed image pairs: 192 the number of unchanged image pairs: 569 The total number of these patches is 7620. The dataset details listed in BiT are just the number of each subset cropped image patches. We need to remove the unchanged image pairs during the change detection model training (Especially for the CrossEntropy loss function). We used this dataset in a data augmentation work BGMix. If you have any concerns about using this dataset. Please let me know. By the way, our cropped data can be downloaded here.

wgcban / SemiCD

data preparation #1