Open RoyiAvital opened 4 years ago
You are right, the API is actually documented fairly well, but end-user examples and a generated documentation website is still missing. This will come eventually.
Regarding boundary conditions: accesses outside the array evaluate to nothing, so you can check for that and specify the behaviour yourself. Here some simple 1d-examples:
using StaticKernels
a = collect(1:9)
k = Kernel{(1:1,)}(w -> something(w[1], 5))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 5]
k = Kernel{(-1:1,)}(w -> something(w[1], w[0]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 9]
k = Kernel{(-1:1,)}(w -> something(w[1], w[-1]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 8]
currently no nice solution, but you can hack it by referencing the data array within your window function like this:
k = Kernel{(-1:1,)}(w -> something(w[1], a[1]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 1]
or more generally by using w.position
to get the current window position.
In the future I'd like to have an interface for common boundary conditions like this:
k = Kernel{(1:1,)}(boundary=5, w -> w[1])
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 5]
k = Kernel{(-1:1,)}(boundary=:replicate, w -> w[1])
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 9]
k = Kernel{(-1:1,)}(boundary=:mirror, w -> w[1]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 8]
k = Kernel{(-1:1,)}(boundary=:circular, w -> w[1])
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 1]
I don't think the readme is the place for that. You can look at the documentation of Kernel
from within Julia, where the parameters are explained and the function something
is defined in Julia base.
A more detailed documentation page will follow.
Thank you for the suggestions.
I referred to the documentation in src/types.jl
:
Kernel{X}(f)
Create a kernel with axes `X` wrapping a kernel function `f`.
The window function defines a reduction of values within the `X`-sized window.
For best performance you should annotate `f` with `@inline` and index accesses
within using `@inbounds`.
But you are totally right: documentation for end-users is not great at the moment, I will try to fix that.
Is there way to have valid
mode?
Namely it will work only on valid indices where the kernel is well defined?
I'm not sure I understand what you are searching for. It will already throw an error if the kernel makes out-of-bounds accesses on arrays without an extension defined. This is since boundary conditions are implemented by calling the window function with properly cropped windows and out-of-bounds access gets turned into calls to getindex_extension
.
If you instead search for some kind of static analysis of the kernel function at compile-time such that no out-of-bounds access will be performed, then I think that is out-of-scope for this package: window functions are treated as more or less black box functions. The @kernel
macro will derive correct kernel bounds automatically as a macro, but only for static/constant indexing.
Think of a simple 3x3 kernel (k
) over an image of 100x100 (mI
).
The valid part of the kernel is the inner 98x98.
Let's say I want to apply this kernel over and over on the same image, but I only care about the inner part. I'd do something like:
map!(k, mI, mI);
Yet the problem is the return size is 98x98
, so it won't work.
One we would be creating a view of the inner part of mI
and then apply this.
I was wondering if there is something built in.
Also, any chance for a guide, for beginners, about creating a user defined extension / kernel? Specifically about handling the boundaries.
Just use an extension: map(k, extend(a, StaticKernels.ExtensionReplicate()))
will return the same dimensions as a
. As long as you are only interested in the inner part, any extension will do the job, performance should not be a big issue.
If for whatever reason you absolutely want to apply the kernel only on the relevant parts (which will be getting smaller each iteration) you need to do that yourself, but the provided axes(k, a)
method for a kernel k
can help you to calculate the bounds.
For extensions there is not much documentation available but it is usually enough to implement StaticKernels.getindex_extension(a, i::Int, ext::MyExtension)
where a
is the array, i
will be an out-of-bounds index and ext
is your extension type as subtype of StaticKernels.Extension
. If you are not afraid of reading code, have a look at src/extension.jl
how the predefined ones are implemented.
I will try to give a concrete example, from the discussion at https://discourse.julialang.org/t/92691:
using BenchmarkTools;
using LoopVectorization;
using StaticKernels;
function SolveLaplace!( mT, numItr )
numRows, numCols = size(mT);
for kk in 1:numItr
for jj in 2:(numCols - 1)
for ii in 2:(numRows - 1)
mT[ii, jj] = 0.25 * (mT[ii - 1, jj] + mT[ii + 1, jj] + mT[ii, jj - 1] + mT[ii, jj + 1]);
end
end
end
return mT;
end
function SolveLaplaceTurbo!( mT, numItr )
numRows, numCols = size(mT);
for kk in 1:numItr
@turbo for jj in 2:(numCols - 1), ii in 2:(numRows - 1)
mT[ii, jj] = 0.25 * (mT[ii - 1, jj] + mT[ii + 1, jj] + mT[ii, jj - 1] + mT[ii, jj + 1]);
end
end
return mT;
end
function SolveLaplaceKernel!( mT, numItr )
numRows, numCols = size(mT);
k = @kernel w -> 0.25 * (w[-1, 0] + w[1, 0] + w[0, -1] + w[0, 1])
mV = @view mT[2:(numRows - 1), 2:(numCols - 1)]
for kk in 1:numItr
# map!(k, mT, extend(mT, StaticKernels.ExtensionReplicate()));
map!(k, mV, mT);
end
return mT;
end
n = 100;
T = zeros(n, n);
T[1, 1:n] .= 500;
T[n, 1:n] .= 500;
T[1:n, 1] .= 200;
T[1:n, n] .= 500;
numItr = 1000;
@btime SolveLaplace!($T, $numItr);
@btime SolveLaplaceTurbo!($T, $numItr);
@btime SolveLaplaceKernel!($T, $numItr);
I wanted to show that StaticKernels.jl
is perfect for that.
Yet it seems the static kernel still doesn't give enough info for the vectorizer as LoopVectorization.jl
.
You may see what I meant by working on the valid
shape but returning the whole.
Is there any simple optimization to StaticKernels.jl
to get the performance of LoopVectorization.jl
in the case above?
Your output and input array for the kernel operation is the same, which means that you almost certainly compute the wrong thing here since you overwrite relevant input data.
This (probably unintended!) data dependency means that vectorization of the code is hard for llvm. Note that LoopVectorization
assumes that loop iterations are independent and usage will therefore change your results in this case!
Nevertheless you could use an intermediate buffer like so to improve:
function SolveLaplaceKernel2!( mT, numItr )
numRows, numCols = size(mT);
mBuf = similar(mT);
k = @kernel w -> 0.25 * (w[-1, 0] + w[1, 0] + w[0, -1] + w[0, 1])
mTExt = extend(mT, StaticKernels.ExtensionReplicate())
mBufExt = extend(mBuf, StaticKernels.ExtensionReplicate())
for kk in 1:numItr÷2
map!(k, mBuf, mTExt);
map!(k, mT, mBufExt);
end
if isodd(numItr)
map!(k, mBuf, mTExt)
copy!(mT, mBuf)
end
return mT;
end
For me this has an factor of ~8 improvement over your version, but please benchmark for yourself.
You should be able to achieve similar times using LoopVectorization
if you rewrite it as I did.
PS: I don't think this issue tracker is the right place for these discussions.
In this context it is OK to work in place (Gauss Seidel vs. Jacobi iterations).
I see, so the point is that in the case of StaticKernels.jl
the compiler sees the dependency while in the case of LoopVectorization.jl
it is forced to ignore it.
Regarding the discussion, any chance you enable "Discussions" for this repository?
Ok, in that case you can ignore my example since I assumed it was unintended. If you have this data dependency, then there are there are almost no options for vectorization since computation for each component depends on the result of the previous computation, which is why you have similar performance as the naive loop. When you use @turbo
of LoopVectorization
you explicitly say: "I don't care for this dependency" which may change your results. I would recommend using the traditional loop to make the iteration order explicit, since it is relevant in your case (iteration order for map
over kernels may change in the future).
I'll try enabling the discussion feature.
Hi,
In documentation:
# 2d total variation
k = @kernel w -> abs(w[1,0] - w[0,0]) + abs(w[0,1] - w[0,0])
sum(k, extend(a, StaticKernels.ExtensionReplicate()))
What's the difference between using sum()
and map()
in the context of StaticKernels.jl
?
Hello,
This looks like a great addition to Julia for those who are working on Image Processing.
I think it is missing documentation. But for now, could you show examples with all supported boundary conditions?
Usually we have:
Thank You.