Feature/spatial_quantity

jakelove commented 3 years ago

This extension adds two quantities that represent the positions of each cell. This allows custom fields to depend upon the position of the cell in the Mesh, allowing for spatially dependent effective fields.

Quantity: PositionLocal This quantity represents position vectors of the center of each cell from the center of the mesh, normalised from -0.5 to +0.5. For example, with a 2x3x4 grid of cells we get the following mappings from mesh indices to PositionLocal:

Mesh Index	PositionLocal
[0, 0, 0]	(-0.5, -0.5,-0.5)
[1, 2, 3]	(+0.5, +0.5,+0.5)
[0, 1, 2]	(-0.5, -0,+0.166...)

Quantity: PositionReal This quantity represents the real position vector of the center of each cell from the center of the mesh, units in meters. For example, with a 2x3x4 grid of cells with cell sizes 1nmx2nmx3nm we get the following mappings from mesh indices to PositionWorld:

Mesh Index	PositionReal
[0, 0, 0]	(-0.5nm, -2nm,-4.5nm)
[1, 2, 3]	(+0.5nm, +2nm,+4.5nm)
[0, 1, 2]	(-0.5nm, 0nm,+1.5nm)

Implementation is using CUDA kernel that recomputes the indices when needed. I also tried an implementation where the indices are pre-computed onto a GPU allocated slice and copied when needed but this was slower than the current implementation.

An example of using this to create a spatially dependent anisotropy is given below (Image of result at the end):

////////////////////////////////
//            Meta            //
////////////////////////////////
enabledemag = false

////////////////////////////////
//            Units           //
////////////////////////////////
um := 1e-6
nm := 1e-9
mV := 1e-3
ns := 1e-9
mT := 1e-3
GHz := 1e9

////////////////////////////////
//          Geometry          //
////////////////////////////////
nx := 512
ny := 512
nz := 1

cx := 1*nm
cy := 1*nm
cz := 1*nm

lx := nx * cx
ly := ny * cy
lz := nz * cz

setgridsize(nx, ny, nz)
setcellsize(cx, cy, cz)

////////////////////////////////
//          Material          //
////////////////////////////////
ms          := 1e6
ku_strength := 1e5

msat  = ms
aex   = 1e-12
anisu = vector(0, 0, -1)
ku1   = ku_strength
alpha = 0.25

////////////////////////////////
//         Spatial Ku         //
////////////////////////////////
region_defect := 1

// create a ring shaped mask for our defect
ring_mask := Circle(lx * 0.8).Sub(Circle(lx * 0.5))

// remove the inbuilt anisotropy from the defect
defRegion(region_defect, ring_mask)
ku1.setRegion(region_defect, 0)

// trick to approximate a vortex
Ax := ConstVector( 0, -1,  0)
Ay := ConstVector( 1,  0,  0)
Az := ConstVector( 0,  0,  0)

vortex_approx := Normalized(MulMV(Ax, Ay, Az, PositionLocal))
snapshotas(vortex_approx, "vortex_approx.png")

// mask the vortex against our region, to create a custom anisu at the defect
// directed along the vortex
custom_anisu := Masked(vortex_approx, ring_mask)
snapshotas(custom_anisu, "custom_anisu.png")

anisField := Mul( Const(2*ku_strength/ms)  , Mul( Dot(custom_anisu, m), custom_anisu))
anisEdens := Mul( Const(-0.5*ms) , Dot( anisField, m))

AddFieldTerm(anisField)
AddEdensTerm(anisEdens)

////////////////////////////////
//         Simulation         //
////////////////////////////////
m = uniform(0, 0, 1)
m.setRegion(1, Vortex(1, -1))

run(50*ns)

godsic commented 3 years ago

@JakeLove Impressive work!

Just a few notes:

I would prefer indexing scheme to be consistent with Index2Coord, @JeroenMulkers @JLeliaert what do you think guys?
have you ran tests to make sure there are no regressions elsewhere?
please consider providing tests for this functionality;
would be nice to have performance metrics as well.

jakelove commented 3 years ago

@godsic Thank you for your comments.

All the existing tests have passed and the values in the tables provided have been tested by hand, however I agree automated tests should be added.

Could you elaborate on what you mean by "I would prefer indexing scheme to be consistent with Index2Coord". The implementation for PositionReal is essentially a mapping of Index2Coord across the whole mesh.

godsic commented 3 years ago

@JakeLove I mean that index2coord and PositionReal give the same values for the given index (i, j, k), including moving window functionality. The "same" is a bit too strong term here, since index2coord uses double precision, while CUDA implementation uses single one, so "same" within rounding error then.

To be honest, I am a bit surprised that indices pre-computed on CPU and then transferred to GPU when needed turned to be slower. Could you please elaborate a bit on how it was implemented?

jakelove commented 3 years ago

Ah yes, I hadn't considered the moving window functionality.

As for the precomputed method, here is my implementation (doesn't work with resizing the mesh currently but is given as a PoC):

var __poslocalcache *data.Slice
var __poslocaliscached bool = false

func PositionLocalCached(dst *data.Slice) {

    if !__poslocaliscached {

        fmt.Print("Allocating __poslocalcache on device\n")
        __poslocalcache = cuda.NewSlice(3, Mesh().Size())

        dims := Mesh().Size()

        sx := 1.0 / float32(decrement_safe(dims[X]))
        sy := 1.0 / float32(decrement_safe(dims[Y]))
        sz := 1.0 / float32(decrement_safe(dims[Z]))

        fmt.Print("Filling __poslocalcache\n")
        cuda.SpatialField(__poslocalcache, sx, sy, sz)

        __poslocaliscached = true
    }

    data.Copy(dst, __poslocalcache)
}

Bencmarking with the script here: https://gist.github.com/JakeLove/d9dcc45f1b8e2103b46f793ac554b0dd

I got times of 140s for the in-place implementation and 152s for the precomputed implementation (over 5 runs average on a GTX1070).

godsic commented 3 years ago

@JakeLove Thanks for sharing. I misunderstood the design, thinking that you calculate indices on CPU side (just with some go code) and then copy them to GPU buffer. Those need updates on mesh resizing and moving window. I think updates on resize are OK, but on moving window might quickly make CPU-side calculation a bottleneck.

Regarding PoC I am a bit surprised it gets slower as device-to-device memcpy is typically blazing fast. The only reason I can think of is data.Copy calling memcpy 3 times (once per data component), while in-place implementation makes a single CUDA kernel call. In other words you are observing CUDA overhead.

Finally, I would suggest to do sx := float32(1.0 / dims[X]) where applicable, as it would minimize rounding error.

mumax / 3

Feature/spatial_quantity #289