Closed longyangqi closed 1 year ago
sbs.row_flow
is a model that emulates the output of apply_stereo_divergence_polylines (from stable-diffusion-webui-depthmap-script).
The original apply_stereo_divergence_polylines
algorithm is very slow for video processing, so I reimplemented it with a fast ML model. This model is just for processing speed.
how to train the model? And what's the training data
training code is available at https://github.com/nagadomi/nunif/tree/master/iw3/training/sbs .
dataset is regular RGB image files. datasets for image classification, superresolution, etc. can be used without modification.
Below is example commands for training.
Image Dataset directory structure
./dataset/
├── train/
└── eval/
train/
is image data directory for training. eval/
is image data directory for evaluation (validation). At least one image file must be placed for each directory.
Creating training data from image dataset
python create_training_data.py sbs --dataset-dir ./dataset/ -o ./data/sbs_data
This command generates 256x256 depth and stereo images for training command. ./data/sbs_data
is the output directory.
Training
# scratch
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.0001
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.00005 --resume --reset-state
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.00003 --resume --reset-state
python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model --checkpoint-file ./iw3/pretrained_models/row_flow_fp32.pth --learning-rate 0.00003
`models/sbs_model` is the output directory for trained model.
For testing, `iw3` command currently has no `--model` option, so either overwrite `iw3/pretrained_models/row_flow_fp32.pth` or edit [FLOW_MODEL_PATH](https://github.com/nagadomi/nunif/blob/c4740012ee8b3772acd9c32b0fe627db1d6b94f5/iw3/utils.py#L20).
>how many pixels will the left-right image divergence from the input image (eg. divergence=1.0)
First, iw3 uses 0-1 normalized depth. It is not using the metric depth correctly.
`--divergence` is percentage of the input image width. divergence=1 shifts the pixel position up to 1% of the input image width(px).
Thank you for your quick and detailed reply!
Thanks for the tool iw3. I want to know more about row_flow model