This repo contains the code for the STDL SAS-project (SAS for "Segmentation automatique du sol", automatic segmentation of soils in english) developed in collaboration with the Canton of Fribourg. The general description of the project is available on the STDL Website and the technical report on the Tech Website.
A minimum of 32GB of RAM and a GPU with at least 16GB of vRAM is needed to run the HEIG-VD model.
Install Docker to be able to run the Docker config files that will create the environment with all the dependencies.
Hints to use Docker are given in the Running Docker section.
The scripts/
folder consists of:
heigvd/
folder that contains the code to run the DL-model of the Institute of Territorial Engineering (INSIT) at the School of Engineering and Management (HEIG-VD) and the model itself.utilities/
folder that contains the functions that are used in the pipelinesMore info on the content of these folders is given in the Folder Structure section.
Here is the ordered procedures performed during the project:
The performance of 3 sets of models can be evaluated: 6 models of the french National Institute of Geographic and Forest Information (IGN), 1 model of the Institute of Territorial Engineering (INSIT) at the School of Engineering and Management (HEIG-VD), and 1 model of the Federal Statistical Office (OFS). All the needed packages are accessible from the Docker container general-gis
.
Use Jupyter notebooks, file name starting with '0-', to run the pipelines in the correct order using Jupyter's commanline interface (starting cells with a '!'). Source and target folders need to be given in the first cell.
The training pipeline consists of 2 steps:
The script 1-0-train_prep.ipynb
prepares the data for training. It calls different functions from the utilities/
folder and uses config files from the config/train/
folder. The data is saved in the datasets/
directory. All the needed packages are accessible from the Docker container general-gis
.
Note that the split into training and validation data is done randomly in the script random_split.py
. The random seed 6 was found to be the one that distributes the classes most evenly. However, afterwards, some ID's were manually redistributed to create an even more balanced split.
The dataset structure is as follows:
dataset/
├── train/
│ ├── ipt/
│ │ ├── 0.tif
│ │ ├── 1.tif
│ │ └── ...
│ └── tgt/
│ ├── 0.tif
│ ├── 1.tif
│ └── ...
└── val/
├── ipt/
│ ├── 0.tif
│ ├── 1.tif
│ └── ...
└── tgt/
├── 0.tif
├── 1.tif
└── ...
Training is conducted in the docker container model-heigvd
.
Before training, additional dependencies have to be installed. The following steps have to be done when first running the container:
cd /ViT-Adapter/segmentation/ops
python3 setup.py build install
The training is done using the train.py
script, that lies in the directory /ViT-Adapter/segmentation/
within the Docker container. The only parameter that has to be specified is the path to the config file, which specifies the model, the paths to the datasets, the optimizer, the loss function, and the training parameters. The config file is located at /proj-soils/scripts/heigvd/model/mask2former_beit_adapter_large_512_160k_*_ss.py
. During training, the process is logged in the /proj-soils/data/heig-vd_logs_checkpoints
directory (in log, json, and tensorboard format). Depending on the config file, the checkpoints are also saved in this directory: always the most recent one, and the best one (based on the validation mIoU).
All necessary configurations are stored in /proj-soils/config/config-infere_heig-vd.yaml
.
Clip to AOI: The script clip_tiffs.py
clipps a geotiff by a specified geopackage.
Convert to RGB: The script rgbi2rgb.py
converts a 4-band geotiff to a 3-band geotiff. This step isn't necessary, if SWISSIMAGE10cm imagery is used as input. Note that the checkpoint /proj-soils/data/data_exo-gpu-01/model_data/heigvd_data/M2F_ViTlarge_best_mIoU_iter_160000.pth
has been trained on FLAIR-1 imagery.
(Optional) Rescale: If experiments regarding the input resolution are conducted, the script rescale_tiffs.py
can used to rescale the input images.
Inference of the HEIG-VD model is conducted in the model-heigvd
container, using the script located at /proj-soils/scripts/heigvd/code/infere_heigvd.py
. All the parameters are specified in the config file. The output is saved in the specified directory. Note that multiple input, output, side_length, and stride values can be stated. If more than one input directory is given, it will loop over the specified directories, using the output, stride, and side_length parameters at the same index as the input directory.
As the HEIG-VD model has been trained on the French FLAIR-1 classes, the output of the original checkpoint /proj-soils/data/data_exo-gpu-01/model_data/heigvd_data/M2F_ViTlarge_best_mIoU_iter_160000.pth
has to be reclassified to our own classes. This is done using the script reclassify.py. The parameters are specified in the config file config-infere_heigvd-orig.yaml
.
To build and run a specific Docker container with the needed packages, run docker compose up -d <container>
within the root directory (proj-soils/). This will build the image and run the container.
After the container runs:
docker ps
docker exec -it <container_id> bash
Note that the general-gis
container is by default set up to run a Jupyter server at port 8888. Thus, for this container, the command docker compose up -d general-gis
is enough to expose the server to the local computer. Within the Docker container, one can list the Jupyter server to find the corresponding link jupyter server list
.
The folder structure of this project is as follows:
proj-soils
│
├── config
│ ├── eval
│ │ Configuration files concerning the evaluation pipeline
│ │
│ ├── infere
│ │ Configuration files concerning the inference pipeline
│ │
│ └── train
│ Configuration files concerning the training pipeline
│
└── scripts
├── Jupyter notebooks that run the pipelines in the correct order
│ using Jupyter's command line interface (starting cells with a '!')
│
├── heigvd
│ ├── code
│ │ ├── __init__.py: Is copied to the Docker container during
│ │ │ the build process to include the dataset definition
│ │ │ (proj_soils.py) for mmsegmentation
│ │ │
│ │ ├── proj_soils.py: Dataset definition for mmsegmentation
│ │ │
│ │ ├── infere_heigvd.py: Script to call the HEIG-VD model
│ │ │
│ │ ├── infere_heigvd.ipynb: Jupyter notebook to call the HEIG-VD for debug purposes
│ │ │
│ │ └── train.py: Script to train the HEIG-VD model, is mounted
│ │ in the Docker container at /ViT-Adapter/segmentation/
│ │
│ └── model
│ ├── encoder_decoder_mask2former.py
│ │ Source code for the HEIG-VD model, is mounted in the
│ │ Docker container at /ViT-Adapter/segmentation/mmseg_custom/models/segmentors/
│ │
│ └── mask2former_beit_adapter_large_512_160k_proj-soils_12class_*.py
│ Config files for the HEIG-VD model (for mmsegmentation)
│
├── prepare_digitization
│ ├── folderstructure4beneficiaries.py
│ │ Script to prepare the folder structure for the digitization
│ │
│ └── mosaic_with_OTB.ipynb: Jupyter notebook to horizontally
│ mosaick tiff files using Orfeo Toolbox
│
└── utilities
Different scripts that are used in the pipelines