### MaskTools 2 ###
(20201229: can be built under linux/gcc)
Masktools2 v2.2.30 (20220219)
mod by pinterf
Change log at the end of this document
Differences to Masktools 2.0b1
project moved to Visual Studio 2019, requires Visual Studio redistributables
add back "none" and "ignore" for values to "chroma" parameter (2.2.9-)
mt_merge at 8 bit clips: keep exact pixel values when mask is 0 or 255 (v2.2.7-)
Fix: mt_merge (and probably other multi-clip filters) may result in corrupted results under specific circumstances, due to using video frame pointers which were already released from memory
no special function names for high bit depth filters
filters are auto registering their mt mode as MT_NICE_FILTER for Avisynth+
Avisynth+ high bit depth support (incl. planar RGB, color spaces with alpha plane are supported from v2.2.7) All filters are now supporting 10, 12, 14, 16 bits and float Note: From v2.2.15 the 32 bit float U and V chroma channels are 0 centered instead of 0.5, supporting the change in Avisynth+ in May 2018. This change affects lut functions and mt_diff. (The last Avisynth+ version that matches masktools 2.2.14 is Avs+ r2664, use r2724 or newer!) Threshold and sc_value parameters are scaled automatically to the current bit depth (v2.2.5-) from a default 8-bit value. Y,U,V,A (and parameters chroma/alpha) negative (memset) values are scaled automatically to the current bit depth (v2.2.7-, chroma/alpha v.2.2.8) from a default 8-bit value. Default range of such parameters can be overridden to 8-16 bits or float. Disable parameter scaling with paramscale="none"
New plane mode: 6 (copy from fourth clip) for "Y", "U", "V" and "A" New "chroma" and "alpha" plane mode override: "copy fourth" Use for mt_lutxyza which has four clips
YV411 (8 bit 4:1:1) support
mt_merge accepts 4:2:2 clips when luma=true (8-16 bit)
mt_merge accepts 4:1:1 clips when luma=true
mt_merge new parameter hint when luma=true and 4:2:0/4:2:2 'cplace' (2.2.15-). Possible values "mpeg1", "mpeg2" (default), "topleft" (2.2.37-)
mt_merge to discard U and V automatically when input is greyscale
some filters got AVX (float) and AVX2 (integer) support: mt_merge: 8-16 bit: AVX2, float:AVX mt_logic: 8-16 bit: AVX2, float:AVX mt_edge: 8-16 bit: AVX2, 32 bit float AVX mt_expand, mt_inpand: 10-16 bit: AVX2
mt_polish to recognize new constants and scaling operator, and some other operators introduced in earlier versions. For a complete list, see v2.2.4 change log
new: mt_lutxyza. Accepts four clips. 4th variable name is 'a' (besides x, y and z)
new: mt_luts: weight expressions as an addon for then main expression(s) (martin53's idea)
Weights are float values. Weight luts are x,y (2D) luts, similarly to the base working mode, where x is the base pixel, y is the current pixel from the neighbourhood, defined in "pixels".
When the weighting expression is "1", the result is the same as the basic weightless mode.
For modes "average" and "std" the weights are summed up. Result is: sum(value_i*weight_i)/sum(weight_i).
When all weights are equal to 1.0 then the expression will result in the average: sum(value_i)/n.
Same logic works for min/max/median/etc., the "old" lut values are pre-multiplied with the weights before accumulation.
expression syntax supporting bit depth independent expressions
bit-depth aware scale operators
operator "scaleb" scales from 8 bit to current bit depth using bit-shifts. scaleb alternative: @B (do not use, deprecated) Use this for YUV. "235 scaleb" -> always results in max luma
operator "scalef" scales from 8 bit to current bit depth using full range stretch. scalef alternative: @F (do not use, deprecated) "255 scalef" results in maximum pixel value of current bit depth. Calculation: x/255*65535 for a 8->16 bit sample (rgb)
Warning: please use scaleb or scalef instead of @B and @F, to match the syntax with avisynth's Expr filter
Since v2.2.20: "yscalef" and "yscaleb" keywords are similar to "scalef" and "scaleb" but scaling is forced to use rules for Y (non-UV) planes
hints for non-8 bit based constants: Added configuration keywords i8, i10, i12, i14, i16 and f32 in order to tell the expression evaluator the bit depth of the values that are to scale by scaleb and scalef operators.
By default scaleb, scalef, yscaleb and yscalef scales from 8 bit to the bit depth of the clip.
i8 .. i16 and f32 sets the default conversion base to 8..16 bits or float, respectively.
When used with scale_inputs=true (v2.2.15-), it specifies the internal working range (temporary scaling target)
These keywords can appear anywhere in the expression, but only the last occurence will be effective for the whole expression.
Examples
8 bit video, no modifier: "x y - 256 scaleb *" evaluates as "x y - 256 *"
10 bit video, no modifier: "x y - 256 scaleb *" evaluates as "x y - 1024 *"
10 bit video: "i16 x y - 65536 scaleb *" evaluates as "x y - 1024 *"
8 bit video: "i10 x y - 512 scaleb *" evaluates as "x y - 128 *"
new pre-defined, bit depth aware constants
bitdepth: automatic silent parameter of the lut expression (clip bit depth)
sbitdepth: automatic silent parameter of the lut expression (bit depth of values to scale)
range_half --> autoscaled 128 or 0.5 for float luma/rgb, 0.0 for float chroma
yrange_half --> autoscaled 128 or 0.5 for float (new since v2.2.20)
range_min --> 0 for 8-16 bits and non-UV 32bit, or -0.5 for float UV chroma (new from 2.2.15)
yrange_min --> 0 for 8-16 bits and 32bit (new since v2.2.20)
range_max --> 255/1023/4095/16383/65535 or 1.0 for float luma or 0.5 for float chroma
yrange_max --> 255/1023/4095/16383/65535 or 1.0 for float luma and float chroma (new since v2.2.20)
range_size --> 256/1024...65536
ymin, ymax, cmin, cmax --> 16/235 and 16/240 autoscaled. For zero based float: (16-128)/255.0 and (240-128)/255.0
Example #1 (bit depth dependent, all constants are treated as-is):
expr8_luma = "x 16 - 219 / 255 *"
expr10_luma = "x 64 - 876 / 1023 *"
expr16_luma = "x 4096 - 56064 / 65535 *"
Example #2 (new, with auto-scale operators )
expr_luma = "x 16 scaleb - 219 scaleb / 255 scalef *"
expr_chroma = "x 16 scaleb - 224 scaleb / 255 scalef *"
Example #3 (new, with constants)
expr_luma = "x ymin - ymax ymin - / range_max *"
expr_chroma = "x cmin - cmax cmin - / range_max *"
or works for float with: range_min: expr_chroma = "x cmin - cmax cmin - / range_max range_min - *"
new option for Lut expressions:
Parameter "clamp_float"
32 bit float video is always a bit different, as ususally no clamping is applied to valid ranges This parameter along with clamp_float_UV changes this behaviour.
false: no clamp true: standard clamp which is 0..1 for Luma or for RGB color space and -0.5..0.5 for YUV chroma UV chroma clamping can be set to 0..1 by using clamp_float_UV since v2.2.20 Default: false
new option for Lut expressions:
Parameter "clamp_float_UV" (since v2.2.20)
false: standard clamp which is 0..1 for Luma or for RGB color space and -0.5..0.5 for YUV chroma UV true: chroma UV clamp same as luma 0..1, used in conjunction with expressions written for integer (positive only) U/V values in mind. Default: false
new option for Lut expressions:
parameter "scale_inputs" (default "none")
Autoscale any input (x,y,z,a) bit depths to 8-16 bit for internal expression use, the conversion method is either full range or limited YUV range. (Replaces clamp_f_i8, clamp_f_i10, clamp_f_i12, clamp_f_i14 or clamp_f_i16, clamp_f_f32 or clamp_f keywords)
Feature is available from v2.2.15
The primary reason of this feature is the "easy" usage of formerly written expressions optimized for 8 bits.
Use
Usually limited range is for normal YUV videos, full scale is for RGB or known-to-be-fullscale YUV
By default the internal conversion target is 8 bits, so old expressions written for 8 bit videos will probably work. This internal working bit-depth can be overwritten by the i8, i10, i12, i14, i16 specifiers.
When using autoscale mode, scaleb, scalef, yscaleb and yscalef keywords are meaningless for 8-16 bits, because there is nothing to scale. 32 bit (float) values will be scaled however when "float", "floatUV", "all", "allf" is specified.
How it works:
The predefined constants such as 'range_max', etc. will behave according to the internal working bit depth
Warning#1 This feature was created for easy porting earlier 8-bit-video-only expressions. You have to understand how it works internally.
Let's see a 16bit input in "all" and "allf" mode (target is the default 8 bits)
Limited range 16->8 bits conversion has a factor of 1/256.0 (Instead of shift right 8 in integer domain, float-division is used or else it would lose presision) Full range 16->8 bits conversion has a factor of 255.0/65535
Using bit shifts (really it's division and multiplication by 2^8=256.0): result = calculate_lut_value(input / 256.0) 256.0 Full scale 16-8-16 bit mode ('intf', 'allf') result = calculate_lut_value(input / 65535.0 255.0 ) / 255.0 * 65535.0
Use scale_inputs = "all" ("int", "float") for YUV videos with 'limited' range e.g. in 8 bits: Y=16..235, UV=16..240). Use scale_inputs = "allf" (intf, floatf) for RGB or YUV videos with 'full' range e.g. in 8 bits: channels 0..255.
When input is 32bit float, the 0..1.0 (luma) and -0.5..0.5 (chroma) channel is scaled to 0..255 (8 bits), 0..1023 (i10 mode), 0..4095 (i12 mode), 0..16383(i14 mode), 0..65535(i16 mode) then back.
Warning#2 One cannot specify different conversion methods for converting before and after the expression. Neither can you specify different methods for different input clips (e.g. x is full, y is limited is not supported).
(obsolate feature) New expression syntax for lut-type filters: auto scale modifiers for float clips: !!! Note: these can be removed in later editions, pls. use "scale_inputs" and "clamp_float" parameters instead (since v2.2.15) Keyword at the beginning of the expression:
Input values 'x', 'y', 'z' and 'a' are autoscaled by 255.0, 1023.0, ... 65535.0 before the expression evaluation, so the working range is similar to native 8, 10, ... 16 bits. The predefined constants 'range_max', etc. will behave for 8, 10,..16 bits accordingly.
The result is automatically scaled back to 0..1 and is clamped to that range. When using clamp_f_f32 (or clamp_f) the scale factor is 1.0 (so there is no scaling), but the final clamping will be done anyway. No integer rounding occurs.
# obsolate examples, from v2.2.15 use scale_inputs and clamp_float parameter instead of clamp_f_xx keywords
expr = "x y - range_half +" # good for 8..32 bits but float is not clamped
expr = "clamp_f y - range_half +" # good for 8..32 bits and float clamped to 0..1 (or +/-0.5 when chroma)
expr = "x y - 128 + " # good for 8 bits
expr = "clamp_f_i8 x y - 128 +" # good for 8 bits and float, float will be clamped to 0..1 (or +/-0.5 when chroma)
expr = "clamp_f_i8 x y - range_half +" # good for 8..32 bits, float will be clamped to 0..1 (or +/-0.5 when chroma)
parameter "stacked" (default false) for filters with stacked format support Stacked support is not intentional, but since tp7 did it, I did not remove the feature. Filters currently without stacked support will never have it.
parameter "realtime" for lut-type filters, slower but at least works on those bit depths where LUT tables would occupy too much memory.
Also see: 'use_expr' which can pass realtime calculation to Avisynth+ Expr filter!
For bit depth limits where realtime = true is set as the default working mode, see table below.
realtime=true can be overridden, one can experiment and force realtime=false even for a 16 bit lutxy (8GBytes lut table!, x64 only) or for 8 bit lutxzya (4GBytes lut table)
parameter "use_expr" integer (default 0) for 'lut', 'lutxy', 'lutxyz', 'lutxyza' filters (from v2.2.15) Use it when realtime calculation (interpreted pixel-by-pixel expression calculation) is slow and an appropriate Avisynth+ version (>r2712) is available.
By sending the expression to Avisynth+, lut filters can utilize a realtime JIT-compiled fast expression calculation.
Possible values: 0: uses lut and internal realtime calculation 1: Expr, when bit depth>=10 or lutxyza 2: When masktools would use realtime calc, passes the expressions and parameters to the "Expr" filter in Avisynth+ 3: Expr, always passed (from 2.2.17)
For modes 1, 2 and 3: Passes the expressions, "scale_inputs", "clamp_float" and "clamp_float_UV" parameter to the "Expr" filter in Avisynth+ Note: clamp_float_UV is valid parameter only from Avisynth+ 3.5, and for compatiblity reasons is passed only when it's true, so when it differs from the default value.
Note #1: Avisynth+ internal precision is 32bit float, masktools2 is double (usually no difference can be seen) Note #2: Some keywords (e.g. bit shift) are not available on Avisynth+ Note #3: Since "Expr" can work only on full sized clips, offX, offY, w and h parameters are ignored. Note #4: Since v2.2.26 this parameter is silently ignored when "Expr" filter is missing from the actual Avisynth host.
parameter "paramscale" for filters working with threshold-like parameters (v2.2.5-) Filters: mt_binarize, mt_edge, mt_inpand, mt_expand, mt_inflate, mt_deflate, mt_motion, mt_logic, mt_clamp paramscale can be "i8" (default), "i10", "i10", "i12", "i14", "i16", "f32" or "none" or "" Using "paramscale" tells the filter that parameters are given at what bit depth range. By default paramscale is "i8", so existing scripts with parameters in the 0..255 range are working at any bit depths
mt_binarize(threshold=80*256, paramscale="i16") # threshold is assumed in 16 bit range
mt_binarize(threshold=80) # no param: threshold is assumed in 8 bit range
thY1 = 0.1
thC1 = 0.1
thY2 = 0.1
thC2 = 0.1
paramscale="f32"
mt_edge(mode="sobel",u=3,v=3,thY1=thY1,thY2=thY2,thC1=thC1,thC2=thC2,paramscale=paramscale) # f32: parameters assumed as float (0..1.0)
new: "swap" keyword in expressions (v2.2.5-) swaps the last two results during RPN evaluation. Not compatible with mt_infix()
expr="x 2 /"
expr="2 x swap /"
new: "swap1" to "swap9" keywords in expressions (v2.2.25-) Swaps the Nth stack result with stack top (top is N=0) during RPN evaluation. Appears as a function in mt_infix.
new: "dup" keyword in expressions (v2.2.5-) Duplicates the last result and put on the top of RPN evaluation stack.
expr="x 3 / x 3 / +"
expr="x 3 / dup +"
new: "dup0" to "dup9" keyword in expressions (v2.2.25-) Duplicates the Nth result and put on the top of RPN evaluation stack. dup0 is the same as dup. Appears as a function in mt_infix
Feature matrix
8 bit | 10-16 bit | float | stacked | realtime | use_expr
mt_invert X X X -
mt_binarize X X X X
mt_inflate X X X X
mt_deflate X X X X
mt_inpand X X X X
mt_expand X X X X
mt_lut X X X X when float yes
mt_lutxy X X X - when bits>=14 yes
mt_lutxyz X X X - when bits>=10 yes
mt_lutxyza X X X - always yes
mt_luts X X X - when bits>=14 no
mt_lutf X X X - when bits>=14 no
mt_lutsx X X X - when bits>=10 no
mt_lutspa X X X - no
mt_merge X X X X
mt_logic X X X X
mt_convolution X X X -
mt_mappedblur X X X -
mt_gradient X X X -
mt_makediff X X X X
mt_average X X X X
mt_adddiff X X X X
mt_clamp X X X X
mt_motion X X X -
mt_edge X X X -
mt_hysteresis X X X -
mt_infix/mt_polish: available only on non-XP builds
Masktools2 info: http://avisynth.nl/index.php/MaskTools2
Forum: https://forum.doom9.org/showthread.php?t=174333
Article by tp7 http://tp7.github.io/articles/masktools/
Project: https://github.com/pinterf/masktools/tree/16bit/
Original version: tp7's MaskTools 2 repository. https://github.com/tp7/masktools/
Changelog **v2.2.30 (20220218)
**v2.2.29 (20211116)
**v2.2.28 (20211005)
**v2.2.27 (20210909)
fix zero=false case for shape helper function (mt_rectangle, mt_circle, mt_diamond etc...)
lut expressions: report obvious script error (unbalanced stack, invalid keyword or variable, etc)
mt_lut: reuse LUTs across planes if they are the same like in e.g. mt_lutxyz.
1D LUT expressions: occupy only the necessary size for 10-14 bit LUT tables (was: buffer was always reserved for 16 bit data)
mt_merge: error is luma=false, mask is greyscale but clip is not greyscale
mt_merge new parameter hint for chroma placement when luma=true and 4:2:0: "topleft" "topleft" is a new option for 4:2:0 videos only
Refreshing memories for 'cplace' kinds:
String 'cplace': possible values "mpeg1", "mpeg2" (default) or "topleft" ("mpeg1" is center and "mpeg2" is left placement) Parameter is effective only for 420 and 422 formats (topleft is 420-only), otherwise ignored.
Default "mpeg1" is using fast 2x2 pixel (1x2 for 4:2:2) averaging when converting a 4:4:4 mask to a 4:2:0 or 4:2:2 format (old behaviour) 420 schema: +------+------+ | 0.25 | 0.25 | |------+------| | 0.25 | 0.25 | +------+------+
"mpeg2" is using 2x3 (1x3 for 4:2:2) pixel weighted averaging when converting a 4:4:4 mask to a 4:2:0 or 4:2:2 format 420 schema: ------+------+-------+ 0.125 | 0.25 | 0.125 | ------|------+-------| 0.125 | 0.25 | 0.125 | ------+------+-------+
"topleft" is using 3x3 pixel weighted averaging when converting a 4:4:4 mask to a 4:2:0 format
1/16 | 1/8 | 1/16 | ------+------+-------+ 1/8 | 1/4 | 1/8 | ------|------+-------| 1/16 | 1/8 | 1/16 | ------+------+-------+
get the source built with LLVM (not clangCl) again
**no new version (20210209) does not appear in release, just another test fork update
Fetching the new 'cuda' branch from Nekopanda's masktools fork git fetch git://github.com/nekopanda/masktools.git cuda:cuda (Note: only for experimenting and adjusting to latest Avisynth+ Cuda-aware version. Cuda version implements only mt_lut.)
Source syntax update for GCC (20201229)
CMake build environment, builds on Linux, at least on my (pinterf) Ubuntu 19.10 WSL (INTEL_INTRINSICS handling not implemented in the source, so it compiles only for Intel at the moment) (Neither is boost library incorporated: mt_infix is unavaliable, MT_HAVE_BOOST_SPIRIT is not defined - maybe later) For build instructions see the end of this readme.
git clone https://github.com/pinterf/masktools.git cd masktools mkdir build cd build cmake .. sudo make install
**v2.2.26 (20200904)
**v2.2.25 (20200813)
**v2.2.24 (20200619)
**v2.2.23 (20200513)
**v2.2.22 (20200422)
**v2.2.21 (20200410)
**v2.2.20 (20200303)
**v2.2.19 (20190710 - not released)
**v2.2.18 (20180905)
**v2.2.17 (20180710)
**v2.2.16 (20180702)
mt_merge new parameter hint for chroma placement when luma=true and 4:2:0/4:2:2 String 'cplace': possible values "mpeg1" or "mpeg2" (default) Parameter is effective only for 420 and 422 formats, otherwise ignored. Default "mpeg1" is using fast 2x2 pixel (1x2 for 4:2:2) averaging when converting a 4:4:4 mask to a 4:2:0 or 4:2:2 format (old behaviour) 420 schema: +------+------+ | 0.25 | 0.25 | |------+------| | 0.25 | 0.25 | +------+------+
"mpeg2" is using 2x3 (1x3 for 4:2:2) pixel weighted averaging when converting a 4:4:4 mask to a 4:2:0 or 4:2:2 format 420 schema: ------+------+-------+ 0.125 | 0.25 | 0.125 | ------|------+-------| 0.125 | 0.25 | 0.125 | ------+------+-------+
32 bit float U and V chroma channels are now zero based (+/-0.5 for full scale). Was: 0..1, same as luma Since internal format changed, use Avisynth+ r2724 or newer for this masktools2 2.2.16. Affected predefined expression constants when plane is U or V: cmin and cmax (limited range (16-128)/255 and (240-128)/255 instead of 16/255.0 and 240/255.0 range_max: 0.5 instead of 1.0 new: introduce range_min: -0.5 for float U/V chroma, 0 otherwise range_half (0.0 instead of 0.5) (range_size remained 1.0)
New expression syntax for Lut expressions: autoscale any input (x,y,z,a) bit depths to 8-16 bits for internal expression use. The primary reason of this feature is the "easy" usage of formerly written 8 bit optimized expressions.
New parameters for lut functions: String "scale_inputs": "all","allf","int","intf","float","floatUV","floatf","none", default "none" and Boolean "clamp_float": default false, but treated as always true (and thus ignored) when scale_inputs involves a float autoscale. and Boolean "use_expr": default 0, calls fast JIT-compiled "Expr" in Avisynth+ for mt_lut, lutxy, lutxyz, lutxyza 0: no Expr, use slow internal realtime calc if needed (as before) 1: call Expr for bits>8 or lutxyza 2: call Expr, when masktools would do its slow realtime calc (see 'realtime' column in the table above)
Extends and replaces experimental clamp_xxxx keywords.
**v2.2.15 (skipped, test versions)
**v2.2.14 (20180225)
**v2.2.13 (20180201)
**v2.2.12 (20180107)
**v2.2.11 (20180105)
**v2.2.10 (20170612)
**v2.2.9 (20170608)
**v2.2.8 (20170427)
**v2.2.7 (20170421)
**v2.2.6 (20170401)
**v2.2.5 (20170330)
**v2.2.4 (20170304)
new expression syntax: auto scale modifiers for float clips (test for real.finder): Keyword at the beginning of the expression:
Input values 'x', 'y', 'z' and 'a' are autoscaled by 255.0, 1023.0, ... 65535.0 before the expression evaluation, so the working range is similar to native 8, 10, ... 16 bits. The predefined constants 'range_max', etc. will behave for 8, 10,..16 bits accordingly.
The result is automatically scaled back to 0..1 and is clamped to that range. When using clamp_f_f32 (or clamp_f) the scale factor is 1.0 (so there is no scaling), but the final clamping will be done anyway. No integer rounding occurs.
v2.2.3 (20170227)
v2.2.2 (20170223) completed high bit depth support
v2.2.1 (20170218) initial high bit depth release
VS2019: use IDE
Windows GCC (mingw installed by msys2): from the 'build' folder under project root:
del ..\CMakeCache.txt cmake .. -G "MinGW Makefiles" -DENABLE_INTEL_SIMD:bool=on @rem test: cmake .. -G "MinGW Makefiles" -DENABLE_INTEL_SIMD:bool=off cmake --build . --config Release
Linux
Clone repo
git clone https://github.com/pinterf/masktools
cd masktools
cmake -B build -S .
cmake --build build
Not working yet: Possible option test for C only on x86 arhitectures: cmake -B build -S . -DENABLE_INTEL_SIMD:bool=off cmake --build build
Note: ENABLE_INTEL_SIMD is automatically off for non x86 arhitectures and ON for x86
Find binaries at
build/masktools/libmasktools2.so
Install binaries
cd build
sudo make install