nagadomi / nunif

Misc; latest version of waifu2x; 2D video to stereo 3D video conversion
MIT License
1.48k stars 140 forks source link

[iw3]: about row_flow model #60

Closed longyangqi closed 1 year ago

longyangqi commented 1 year ago

Thanks for the tool iw3. I want to know more about row_flow model

  1. how to train the model? And what's the training data
  2. how many pixels will the left-right image divergence from the input image (eg. divergence=1.0) Thanks!
nagadomi commented 1 year ago

sbs.row_flow is a model that emulates the output of apply_stereo_divergence_polylines (from stable-diffusion-webui-depthmap-script). The original apply_stereo_divergence_polylines algorithm is very slow for video processing, so I reimplemented it with a fast ML model. This model is just for processing speed.

how to train the model? And what's the training data

training code is available at https://github.com/nagadomi/nunif/tree/master/iw3/training/sbs .

dataset is regular RGB image files. datasets for image classification, superresolution, etc. can be used without modification.

Below is example commands for training.

  1. Image Dataset directory structure

    ./dataset/
    ├── train/
    └── eval/

    train/ is image data directory for training. eval/ is image data directory for evaluation (validation). At least one image file must be placed for each directory.

  2. Creating training data from image dataset

    python create_training_data.py sbs --dataset-dir ./dataset/ -o ./data/sbs_data

    This command generates 256x256 depth and stereo images for training command. ./data/sbs_data is the output directory.

  3. Training

    
    # scratch
    python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.0001
    python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.00005 --resume --reset-state
    python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model/ --learning-rate 0.00003 --resume --reset-state

finetune

python train.py sbs --arch sbs.row_flow --data-dir ./data/sbs_data --model-dir ./models/sbs_model --checkpoint-file ./iw3/pretrained_models/row_flow_fp32.pth --learning-rate 0.00003


`models/sbs_model` is the output directory for trained model.

For testing, `iw3` command currently has no `--model` option, so either overwrite `iw3/pretrained_models/row_flow_fp32.pth` or edit [FLOW_MODEL_PATH](https://github.com/nagadomi/nunif/blob/c4740012ee8b3772acd9c32b0fe627db1d6b94f5/iw3/utils.py#L20).

>how many pixels will the left-right image divergence from the input image (eg. divergence=1.0)

First, iw3 uses 0-1 normalized depth. It is not using the metric depth correctly.
`--divergence` is percentage of the input image width. divergence=1 shifts the pixel position up to 1% of the input image width(px).
longyangqi commented 1 year ago

Thank you for your quick and detailed reply!