longyangqi commented 1 year ago

Thanks for the tool iw3. I want to know more about row_flow model

how to train the model? And what's the training data
how many pixels will the left-right image divergence from the input image (eg. divergence=1.0) Thanks!

nagadomi commented 1 year ago

sbs.row_flow is a model that emulates the output of apply_stereo_divergence_polylines (from stable-diffusion-webui-depthmap-script). The original apply_stereo_divergence_polylines algorithm is very slow for video processing, so I reimplemented it with a fast ML model. This model is just for processing speed.

how to train the model? And what's the training data

training code is available at https://github.com/nagadomi/nunif/tree/master/iw3/training/sbs .

dataset is regular RGB image files. datasets for image classification, superresolution, etc. can be used without modification.

Below is example commands for training.

Image Dataset directory structure
```
./dataset/
├── train/
└── eval/
```
train/ is image data directory for training. eval/ is image data directory for evaluation (validation). At least one image file must be placed for each directory.
Creating training data from image dataset
```
python create_training_data.py sbs --dataset-dir ./dataset/ -o ./data/sbs_data
```
This command generates 256x256 depth and stereo images for training command. ./data/sbs_data is the output directory.

Training


# scratch
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.0001
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.00005 --resume --reset-state
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.00003 --resume --reset-state

finetune

python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model --checkpoint-file ./iw3/pretrained_models/row_flow_fp32.pth --learning-rate 0.00003


`models/sbs_model` is the output directory for trained model.

For testing, `iw3` command currently has no `--model` option, so either overwrite `iw3/pretrained_models/row_flow_fp32.pth` or edit [FLOW_MODEL_PATH](https://github.com/nagadomi/nunif/blob/c4740012ee8b3772acd9c32b0fe627db1d6b94f5/iw3/utils.py#L20).

>how many pixels will the left-right image divergence from the input image (eg. divergence=1.0)

First, iw3 uses 0-1 normalized depth. It is not using the metric depth correctly.
`--divergence` is percentage of the input image width. divergence=1 shifts the pixel position up to 1% of the input image width(px).

longyangqi commented 1 year ago

Thank you for your quick and detailed reply!

nagadomi / nunif

[iw3]: about row_flow model #60

finetune