smartalecH / meep_adjoint_simple

Simple adjoint variable method for meep that works out of the box.
GNU General Public License v2.0
6 stars 4 forks source link

Correct adjoint scaling factors #1

Open smartalecH opened 4 years ago

smartalecH commented 4 years ago

This is a simplified version of Homer's adjoint variable method. Homer's current version has several great features, but the core functionality still seems slightly buggy. Specifically, the adjoint gradients don't match the finite difference approximates. I've stripped the extra features, and am focusing on objective functions that only have eigenmode coefficient features (q). A test script for the current version is examples/check_gradients.py, which simulates a simple waveguide crossing with the following geometry:

design_region

The design region is the center square (with both blue and green hashes). It's parameterized using a first-degree continuous Galerkin finite element basis (implemented in fenics). It uses just two triangles, which corresponds to just 4 nodes/design variables. This makes calculating the finite-difference gradient relatively easy (compared to having hundreds of partials to choose from).

There is an eigenmode source (red line) and various eigenmode monitors to optimize over. The script currently optimizes abs(P1_north)**2 where P1_north corresponds to the forward propagating mode coefficient from the north monitor.

Unfortunately, the adjoint gradient still doesn't match the finite difference approximate. I'm assuming the issue has to do with the source/fields scaling.

I've slightly modified Homer's implementation. The resulting procedure is as follows:

  1. Differentiate the objective function (still using sympy) with respect to each mode monitor of interest (i.e. find ∂J/∂q).
  2. Run the forward simulation using an eigenmode source.
  3. Calculate the eigenmode coefficients at the mode monitors and record fields in design region (E_x_forward, E_y_forward, E_z_forward).
  4. Plug mode coefficients into the differentiated objective function to evaluate ∂J/∂q with correct values.
  5. Place eigenmode sources at appropriate monitors. Scale amplitude of each source.
  6. Run adjoint simulation and record fields in design region (E_x_adjoint, E_y_adjoint, E_z_adjoint).
  7. Calculate ∂J/∂ϵ with Re{ΣE_i_forward E_i_adjoint} summing over x,y,z components. "" is elementwise product.
  8. Project ∂J/∂ϵ onto p to get ∂J/∂p (seems like there should also be a partial ∂ϵ/∂p somewhere in here...?)

I think the issue is with the scaling of the adjoint sources (or fields, depending on which you want to do). I'm scaling the adjoint source with the following:

scale factor = - ∂J/∂q / (2.0*ω). (from Homer's code)

where ω is meep's ω (i.e. no 2π).

Potential values that need correction/normalization:

Any ideas?

@oskooi @stevengj @HomerReid @danielwboyce

stevengj commented 4 years ago

On step (8), you need the Jacobian matrix ∂ϵ/∂p. Since ε is related to p (for 1st-order elements) by a linear (affine) operation ε = Pp + constant then the Jacobian is just the matrix P from the projection/interpolation step, and you will essentially be multiplying ∂J/∂ϵ by the transpose of this.

There should be no Gram matrix (basis-function overlaps) or pseudo-inverses involved, as far as I can tell.

smartalecH commented 4 years ago

I've managed to write a function that produces the fenics interpolation matrix (which is the Jacobian matrix ∂ϵ/∂p). There are still several discrepancies between the adjoint gradient and numerical approximation.

I've found several other issues/bugs with the current codebase. Namely: 1.) Adjoint source components are incorrect 2.) Adjoint source scaling is incorrect 3.) Finite difference gradients are slightly non-deterministic (I'm assuming this is due to sympy numerical precision errors? Or maybe the fenics interpolation?) 4.) Final field integral to calculate gradient was wrong.

I'm planning on "manually" generating the adjoint gradients without any fancy fenics bases, sympy derivatives, etc. Here's a couple of questions:

1.) Should I still do the holey waveguide problem? We previously mentioned this wasn't a density optimization problem... but it should still be able to be formulated as one, right? 2.) Given meep's illusion of continuity, what's the best way to go from design parameters to a material function? Bilinear interpolation seems easiest and most practical, but the jacobian matrix seems a little hairy...

stevengj commented 4 years ago

Bilinear interpolation is pretty easy and the Jacobian matrix is just the matrix of interpolation coefficients, so it shouldn't be hard to generate.

stevengj commented 4 years ago

See the material_grid type in MPB for example.

smartalecH commented 4 years ago

Before implementing the material_grid feature in meep, I built a simple bilinear interpolator in python (and tested it rigorously). It produces a sparse matrix with the correct interpolating weights that can be used to transform the ∂F/∂ϵ to ∂F/∂p.

I "manually" coded all of the objective function gradients and adjoint sources using a simple waveguide problem:

geometry

My objective function was to maximize the first eigenmode coefficient of the right monitor. This means (according to the literature) that the adjoint source is a scaled eigenmode source at that monitor propagating inward.

I optimized over 100 different parameters (10 in the x-direction, 10 in the y-direction) and compared the adjoint gradient to the finite-difference gradient:

comparison

It's clear that there is certainly a linear correlation between the finite difference approximate and the adjoint, but some things still look off. Obviously, the field/source scaling needs fixing. But I'm wondering if there is something else? If it were just a linear scaling issue, then the two gradient vectors should be parallel (i.e. the points would lie on a straight line with a different slope).

Some issues that I've noticed:

smartalecH commented 4 years ago

Here's the script I used to generate the above results:

import numpy as np
import meep as mp
import meep_adjoint as mpa
import matplotlib.pyplot as plt

mp.quiet(quietval=True)
np.random.seed(64)

Sx = 6
Sy = 5
cell_size = mp.Vector3(Sx,Sy)

pml_layers = [mp.PML(1.0)]

resolution = 10

time = 1200

#----------------------------------------------------------------------
# Eigenmode source
#----------------------------------------------------------------------
fcen = 1/1.55
fwidth = 0.1 * fcen
source_center  = [-1,0,0]
source_size    = mp.Vector3(0,2,0)
kpoint = mp.Vector3(1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
                     eig_band = 1,
                     direction=mp.NO_DIRECTION,
                     eig_kpoint=kpoint,
                     size = source_size,
                     center=source_center)]
forward_power = sources[0].eig_power(fcen)
#----------------------------------------------------------------------
#- geometric objects
#----------------------------------------------------------------------
geometry = [mp.Block(center=mp.Vector3(), material=mp.Medium(index=3.45), size=mp.Vector3(mp.inf, 0.5, 0) )]

Nx = 10
Ny = 10
design_size   = mp.Vector3(1, 1, 0)
design_region = mpa.Subregion(name='design', center=mp.Vector3(), size=design_size)
basis = mpa.BilinearInterpolationBasis(region=design_region,Nx=Nx,Ny=Ny)

beta_vector = 11*np.random.rand(Nx*Ny) + 1

design_function = basis.parameterized_function(beta_vector)
design_object = [mp.Block(center=design_region.center, size=design_region.size, epsilon_func = design_function.func())]

geometry += design_object

sim = mp.Simulation(cell_size=cell_size,
                    boundary_layers=pml_layers,
                    geometry=geometry,
                    sources=sources,
                    eps_averaging=False,
                    resolution=resolution)

#----------------------------------------------------------------------
#- add monitors
#----------------------------------------------------------------------

mon_vol = mp.Volume(center=mp.Vector3(1,0,0),size=mp.Vector3(y=2))
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
design_flux = sim.add_flux(fcen,0,1,mp.FluxRegion(direction=mp.X,center=design_region.center,size=design_region.size))

#----------------------------------------------------------------------
#- run forward simulation
#----------------------------------------------------------------------

sim.run(until=time)

#----------------------------------------------------------------------
#- objective function
#----------------------------------------------------------------------

# pull eigenmode data
def cost_function(sim,f_vol):
    mode = 1
    EigenmodeData = sim.get_eigenmode(fcen, mp.X, f_vol.where, mode, mp.Vector3())
    x,y,z,w = sim.get_array_metadata(dft_cell=f_vol)
    x = np.array(x)
    y = np.array(y)

    Emy = np.zeros((x.size,y.size),dtype=np.complex128)
    Emz = np.zeros((x.size,y.size),dtype=np.complex128)
    Hmy = np.zeros((x.size,y.size),dtype=np.complex128)
    Hmz = np.zeros((x.size,y.size),dtype=np.complex128)

    Ey = np.zeros((x.size,y.size),dtype=np.complex128)
    Ez = np.zeros((x.size,y.size),dtype=np.complex128)
    Hy = np.zeros((x.size,y.size),dtype=np.complex128)
    Hz = np.zeros((x.size,y.size),dtype=np.complex128)

    mcomps = [Emy,Emz,Hmy,Hmz]
    sim_comps = [Ey,Ez,Hy,Hz]

    for ic,c in enumerate([mp.Ey,mp.Ez,mp.Hy,mp.Ez]):
        sim_comps[ic][:] = sim.get_dft_array(flux,c,0)
        for ix,px in enumerate(x):
            for iy,py in enumerate(y):
                mcomps[ic][ix,iy] = EigenmodeData.amplitude(mp.Vector3(px,py),c)

    ob = sim.get_eigenmode_coefficients(f_vol,[1])
    coeff = ob.alpha[0,0,0]

    # Hm* cross E
    C1 = np.sum(np.conj(Hmy) * Ez - np.conj(Hmz) * Ey,axis=None) * (1/resolution) ** 2
    # Em* cross H
    C2 = np.sum(np.conj(Emy) * Hz - np.conj(Emz) * Hy,axis=None) * (1/resolution) ** 2
    # Hm* cross Em
    Nm = -np.sum(np.conj(Hmy) * Emz - np.conj(Hmz) * Emy,axis=None) * (1/resolution) ** 2
    # H* cross E
    Pin = -np.sum(np.conj(Hy) * Ez - np.conj(Hz) * Ey,axis=None) * (1/resolution) ** 2

    #alpha = 0.5 * (C2 / np.conj(Nm) - C1 / Nm)
    alpha = C2 - C1

    f = np.abs(coeff)**2 #1/8*np.abs(alpha) ** 2 / Nm

    A = coeff#1/4*np.conj(alpha) / Nm

    print(A)

    return f, A

f0, alpha = cost_function(sim,flux)

# record design cell fields
d_Ex = sim.get_dft_array(design_flux,mp.Ex,0)
d_Ey = sim.get_dft_array(design_flux,mp.Ey,0)
d_Ez = sim.get_dft_array(design_flux,mp.Ez,0)

#----------------------------------------------------------------------
#- add adjoint sources
#----------------------------------------------------------------------
#scalar = -2 / (1j*fcen*np.abs(Nm)**2) * 1/resolution * np.conjugate(alpha)

sim.reset_meep()

# update sources
kpoint = mp.Vector3(-1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
                     eig_band = 1,
                     direction=mp.NO_DIRECTION,
                     eig_kpoint=kpoint,
                     size = mon_vol.size,
                     center=mon_vol.center)]
sim.change_sources(sources)
adjoint_power = sources[0].eig_power(fcen)
#----------------------------------------------------------------------
#- run adjoint simulation
#----------------------------------------------------------------------
design_flux = sim.add_flux(fcen,0,1,mp.FluxRegion(direction=mp.X,center=design_region.center,size=design_region.size))

sim.run(until=time)

#----------------------------------------------------------------------
#- compute gradient
#----------------------------------------------------------------------
#scale = 1j * np.conj(alpha) / (np.sqrt(adjoint_power) * np.sqrt(forward_power))

#scale = np.conj(alpha / forward_power) * fcen * 1j / np.sqrt(adjoint_power)
scale = alpha.conj() * fcen * 1j / np.sqrt(adjoint_power)
a_Ex = sim.get_dft_array(design_flux,mp.Ex,0) #* scale 
a_Ey = sim.get_dft_array(design_flux,mp.Ey,0) #* scale
a_Ez = sim.get_dft_array(design_flux,mp.Ez,0) #* scale

(x,y,z,w) = sim.get_array_metadata(dft_cell=design_flux)

grid = mpa.xyzw2grid((x,y,z,w))

# Compute dF/deps integral
grad = np.real(2 * (d_Ex * a_Ex + d_Ey * a_Ey + d_Ez * a_Ez) * scale * (1/resolution ** 2))

# Compute dF/rho from dF/deps
g_adjoint = basis.gradient(grad, grid)

'''plt.figure()
plt.imshow(grad)

plt.figure()
print(grad)
plt.imshow(g_adjoint.reshape((Nx,Ny),order='C'))

plt.show()
quit()'''

sim.reset_meep()
kpoint = mp.Vector3(1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
                     eig_band = 1,
                     direction=mp.NO_DIRECTION,
                     eig_kpoint=kpoint,
                     size = source_size,
                     center=source_center)]

sim.change_sources(sources)
#----------------------------------------------------------------------
#- compute finite difference approximate
#----------------------------------------------------------------------

db = 1e-4
n = Nx*Ny
g_discrete = 0*np.ones((n,))

idx = np.random.choice(n,99,replace=False)

for k in idx:

    b0_0 = np.ones((n,))
    b0_1 = np.ones((n,))

    b0_0[:] = beta_vector
    b0_0[k] -= db
    sim.reset_meep()
    design_function.set_coefficients(b0_0)
    flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
    sim.run(until=time)
    fm, _ = cost_function(sim,flux)

    b0_1[:] = beta_vector
    b0_1[k] += db
    sim.reset_meep()
    design_function.set_coefficients(b0_1)
    flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
    sim.run(until=time)
    fp, _ = cost_function(sim,flux)

    g_discrete[k] = (fp - fm) / (2*db)

print("Chosen indices: ",idx)
print("adjoint method: {}".format(g_adjoint[idx]))
print("discrete method: {}".format(g_discrete[idx]))
print("ratio: {}".format(g_adjoint[idx]/g_discrete[idx]))

min = np.min(g_discrete)
max = np.max(g_discrete)

plt.figure()
plt.plot([min, max],[min, max],label='y=x comparison')
plt.plot(g_discrete[idx],g_adjoint[idx],'o',label='Adjoint comparison')
plt.xlabel('Finite Difference Gradient')
plt.ylabel('Adjoint Gradient')
plt.legend()
plt.grid(True)

plt.savefig('comparison.png')

plt.show()
oskooi commented 4 years ago

One way to obtain the Yee field points is to use the field function via output_field_function. However, we will first need to add the equivalent of get_field_point for DFT fields (which does not correctly exist) to obtain the non-interpolated fields which are then used to compute the adjoints. This requires setting the use_centered_grid parameter of the fields::add_dft constructor to False.

stevengj commented 4 years ago

I would double-check whether

makes the adjoint vs. finite-difference gradient plot closer to a straight line (with any slope — the slope just means you are off by a normalization factor, which I'm not so worried about).

Any issue with the Yee grid etcetera should disappear with increasing resolution. Any issue (variation from a straight line) that doesn't disappear with increasing resolution is something more fundamental.

smartalecH commented 4 years ago

The recent update to add_dft_fields seems to help (i.e. the lines are straighter). I simulated a few different resolutions and the results improve, as we discussed. The scale factor is still off, but that's a matter of trial and error now.

comparison_10

comparison_20

comparison_40

Currently, I'm only use the Ez component of the Fourier transformed fields to calculate the gradient. The Ex and Ey components are orders of magnitude smaller. This makes me think that the gradient of the FDTD matrix with respect to the permittivity (A_p) is slightly more complicated than we discussed earlier. That being said, this seems like a good enough approximation for now (or it will be once the field/source scaling is fixed).

I'll post the code below:

smartalecH commented 4 years ago

import numpy as np
import meep as mp
import matplotlib.pyplot as plt
from scipy import sparse

mp.quiet(quietval=True)

# -------------------------------- #
# Bilinear Interpolation Basis class
# -------------------------------- #

class BilinearInterpolationBasis():
    ''' 
    Simple bilinear interpolation basis set.
    '''
    def __init__(self,region,Nx,Ny):
        ''' '''
        self.region = region
        self.Nx = Nx
        self.Ny = Ny
        self.dim = 2

        # Generate interpolation grid
        self.rho_x = np.linspace(region.center.x - region.size.x/2,region.center.x + region.size.x/2,Nx)
        self.rho_y = np.linspace(region.center.y - region.size.y/2,region.center.y + region.size.y/2,Ny)
    def gradient(self,eps,xtics,ytics):
        # get array of grid points that correspond to epsilon vector
        eps = eps.reshape(eps.size,order='C')
        A = gen_interpolation_matrix(self.rho_x,self.rho_y,xtics,ytics)
        return (eps.T * A).T

    def parameterized_function(self, beta_vector):
        """
        Construct and return a callable, updatable element of the function space.

        After the line
            func = basis.parameterized_function(beta_vector)
        we have:
        1. func is a callable scalar function of one spatial variable:
            func(p) = f_0 + \sum_n beta_n * b_n(p)
        2. func has a set_coefficients method for updating the expansion coefficients
            func.set_coefficients(new_beta_vector)
        """

        class _ParameterizedFunction(object):
            def __init__(self, basis, beta_vector):
                '''
                The user supplied beta vector should be a collapsed column vector of the
                design parameters.
                '''
                self.rho_x = basis.rho_x
                self.rho_y = basis.rho_y
                self.beta  = beta_vector

            def set_coefficients(self, beta_vector):
                self.beta = beta_vector
            def __call__(self, p):
                weights, interp_idx = get_bilinear_row(p.x,p.y,self.rho_x,self.rho_y)
                return np.dot( self.beta[interp_idx], weights )
            def func(self):
                def _f(p):
                    return self(p)
                return _f

        return _ParameterizedFunction(self, beta_vector)

# -------------------------------- #
# Helper functions
# -------------------------------- #

def get_bilinear_coefficients(x,x1,x2,y,y1,y2):
    '''
    Calculates the bilinear interpolation coefficients for a single point at (x,y).
    Assumes that the user already knows the four closest points and provides the corresponding
    (x1,x2) and(y1,y2) coordinates.
    '''
    b11 = ((x - x2)*(y - y2))/((x1 - x2)*(y1 - y2))
    b12 = -((x - x2)*(y - y1))/((x1 - x2)*(y1 - y2))
    b21 = -((x - x1)*(y - y2))/((x1 - x2)*(y1 - y2))
    b22 = ((x - x1)*(y - y1))/((x1 - x2)*(y1 - y2))
    return [b11,b12,b21,b22]

def get_bilinear_row(rx,ry,rho_x,rho_y):
    '''
    Calculates a vector of bilinear interpolation weights that can be used
    in an inner product with the neighboring function values, or placed
    inside of an interpolation matrix.
    '''

    Nx = rho_x.size
    Ny = rho_y.size

    # binary search in x direction to get x1 and x2
    xi2 = np.searchsorted(rho_x,rx,side='left')
    if xi2 <= 0: # extrapolation (be careful!)
        xi1 = 0
        xi2 = 1
    elif xi2 >= Nx-1: # extrapolation (be careful!)
        xi1 = Nx-2
        xi2 = Nx-1
    else:
        xi1 = xi2 - 1

    x1 = rho_x[xi1]
    x2 = rho_x[xi2]

    # binary search in y direction to get y1 and y2
    yi2 = np.searchsorted(rho_y,ry,side='left')
    if yi2 <= 0: # extrapolation (be careful!)
        yi1 = 0
        yi2 = 1
    elif yi2 >= Ny-1: # extrapolation (be careful!)
        yi1 = Ny-2
        yi2 = Ny-1
    else:
        yi1 = yi2 - 1

    y1 = rho_y[yi1]
    y2 = rho_y[yi2]

    # get weights
    weights = get_bilinear_coefficients(rx,x1,x2,ry,y1,y2)

    # get location of nearest neigbor interpolation points
    interp_idx = np.array([xi1*Nx+yi1,xi1*Nx+yi2,(xi2)*Nx+yi1,(xi2)*Nx+yi2],dtype=np.int64)

    return weights, interp_idx

def gen_interpolation_matrix(rho_x,rho_y,rho_x_interp,rho_y_interp):
    '''
    Generates a bilinear interpolation matrix.

    Arguments:
    rho_x ................ [N,] numpy array - original x array mapping to povided data
    rho_y ................ [N,] numpy array - original y array mapping to povided data
    rho_x_interp ......... [N,] numpy array - new x array mapping to desired interpolated data
    rho_y_interp ......... [N,] numpy array - new y array mapping to desired interpolated data

    Returns:
    A .................... [N,M] sparse matrix - interpolation matrix
    '''

    Nx = rho_x.size
    Ny = rho_y.size
    Nx_interp = np.array(rho_x_interp).size
    Ny_interp = np.array(rho_y_interp).size

    input_dimension = Nx * Ny
    output_dimension = Nx_interp * Ny_interp

    interp_weights = np.zeros(4*output_dimension) 
    row_ind = np.zeros(4*output_dimension,dtype=np.int64) 
    col_ind = np.zeros(4*output_dimension,dtype=np.int64)

    ri = 0
    for rx in rho_x_interp:
        for ry in rho_y_interp:
            # get weights
            weights, interp_idx = get_bilinear_row(rx,ry,rho_x,rho_y)

            # populate sparse matrix vectors
            interp_weights[4*ri:4*(ri+1)] = weights
            row_ind[4*ri:4*(ri+1)] = np.array([ri,ri,ri,ri],dtype=np.int64)
            col_ind[4*ri:4*(ri+1)] = interp_idx

            ri += 1

    # From matrix vectors, populate the sparse matrix
    A = sparse.coo_matrix((interp_weights, (row_ind, col_ind)),shape=(output_dimension, input_dimension))

    return A

#----------------------------------------------------------------------
# Main routine enters here
#----------------------------------------------------------------------

for resolution in [10]:

    np.random.seed(64)

    Sx = 6
    Sy = 5
    cell_size = mp.Vector3(Sx,Sy)

    pml_layers = [mp.PML(1.0)]

    time = 1200

    #----------------------------------------------------------------------
    # Eigenmode source
    #----------------------------------------------------------------------
    fcen = 1/1.55
    fwidth = 0.1 * fcen
    source_center  = [-1,0,0]
    source_size    = mp.Vector3(0,2,0)
    kpoint = mp.Vector3(1,0,0)
    sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
                        eig_band = 1,
                        direction=mp.NO_DIRECTION,
                        eig_kpoint=kpoint,
                        size = source_size,
                        center=source_center)]
    forward_power = sources[0].eig_power(fcen)

    #----------------------------------------------------------------------
    #- geometric objects
    #----------------------------------------------------------------------
    geometry = [mp.Block(center=mp.Vector3(), material=mp.Medium(index=3.45), size=mp.Vector3(mp.inf, 0.5, 0) )]

    Nx = 4
    Ny = 4

    design_size   = mp.Vector3(1, 1, 0)
    design_region = mp.Volume(center=mp.Vector3(), size=design_size)
    basis = BilinearInterpolationBasis(region=design_region,Nx=Nx,Ny=Ny)

    beta_vector = 11*np.random.rand(Nx*Ny) + 1

    design_function = basis.parameterized_function(beta_vector)
    design_object = [mp.Block(center=design_region.center, size=design_region.size, epsilon_func = design_function.func())]

    geometry += design_object

    sim = mp.Simulation(cell_size=cell_size,
                        boundary_layers=pml_layers,
                        geometry=geometry,
                        sources=sources,
                        eps_averaging=False,
                        resolution=resolution)

    #----------------------------------------------------------------------
    #- add monitors
    #----------------------------------------------------------------------

    mon_vol = mp.Volume(center=mp.Vector3(1,0,0),size=mp.Vector3(y=2))
    flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
    design_flux = sim.add_dft_fields([mp.Ex,mp.Ey,mp.Ez],fcen,fcen,1,where=design_region,yee_grid=True)

    #----------------------------------------------------------------------
    #- run forward simulation
    #----------------------------------------------------------------------

    sim.run(until=time)

    #----------------------------------------------------------------------
    #- objective function
    #----------------------------------------------------------------------

    # pull eigenmode data
    def cost_function(sim,f_vol):
        mode = 1
        EigenmodeData = sim.get_eigenmode(fcen, mp.X, f_vol.where, mode, mp.Vector3())
        x,y,z,w = sim.get_array_metadata(dft_cell=f_vol)
        x = np.array(x)
        y = np.array(y)

        Emy = np.zeros((x.size,y.size),dtype=np.complex128)
        Emz = np.zeros((x.size,y.size),dtype=np.complex128)
        Hmy = np.zeros((x.size,y.size),dtype=np.complex128)
        Hmz = np.zeros((x.size,y.size),dtype=np.complex128)

        Ey = np.zeros((x.size,y.size),dtype=np.complex128)
        Ez = np.zeros((x.size,y.size),dtype=np.complex128)
        Hy = np.zeros((x.size,y.size),dtype=np.complex128)
        Hz = np.zeros((x.size,y.size),dtype=np.complex128)

        mcomps = [Emy,Emz,Hmy,Hmz]
        sim_comps = [Ey,Ez,Hy,Hz]

        for ic,c in enumerate([mp.Ey,mp.Ez,mp.Hy,mp.Ez]):
            sim_comps[ic][:] = sim.get_dft_array(flux,c,0)
            for ix,px in enumerate(x):
                for iy,py in enumerate(y):
                    mcomps[ic][ix,iy] = EigenmodeData.amplitude(mp.Vector3(px,py),c)

        ob = sim.get_eigenmode_coefficients(f_vol,[1])
        coeff = ob.alpha[0,0,0]

        # Hm* cross E
        C1 = np.sum(np.conj(Hmy) * Ez - np.conj(Hmz) * Ey,axis=None) * (1/resolution) ** 2
        # Em* cross H
        C2 = np.sum(np.conj(Emy) * Hz - np.conj(Emz) * Hy,axis=None) * (1/resolution) ** 2
        # Hm* cross Em
        Nm = -np.sum(np.conj(Hmy) * Emz - np.conj(Hmz) * Emy,axis=None) * (1/resolution) ** 2
        # H* cross E
        Pin = -np.sum(np.conj(Hy) * Ez - np.conj(Hz) * Ey,axis=None) * (1/resolution) ** 2

        #alpha = 0.5 * (C2 / np.conj(Nm) - C1 / Nm)
        alpha = C2 - C1

        f = 1/8*np.abs(alpha) ** 2 / Nm
        A = 1/4*np.conj(alpha) / Nm

        f = np.abs(coeff)**2 #1/8*np.abs(alpha) ** 2 / Nm
        A = coeff#1/4*np.conj(alpha) / Nm

        print(A)

        return f, A

    f0, alpha = cost_function(sim,flux)

    # record design cell fields
    d_Ex = sim.get_dft_array(design_flux,mp.Ex,0)
    d_Ey = sim.get_dft_array(design_flux,mp.Ey,0)
    d_Ez = sim.get_dft_array(design_flux,mp.Ez,0)

    #----------------------------------------------------------------------
    #- add adjoint sources
    #----------------------------------------------------------------------

    sim.reset_meep()

    # update sources
    kpoint = mp.Vector3(-1,0,0)
    sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
                        eig_band = 1,
                        direction=mp.NO_DIRECTION,
                        eig_kpoint=kpoint,
                        size = mon_vol.size,
                        center=mon_vol.center)]
    sim.change_sources(sources)
    adjoint_power = sources[0].eig_power(fcen)
    #----------------------------------------------------------------------
    #- run adjoint simulation
    #----------------------------------------------------------------------
    design_flux = sim.add_dft_fields([mp.Ex,mp.Ey,mp.Ez],fcen,fcen,1,where=design_region,yee_grid=True)

    sim.run(until=time)

    #----------------------------------------------------------------------
    #- compute gradient
    #----------------------------------------------------------------------
    #scale = 1j * np.conj(alpha) / (np.sqrt(adjoint_power) * np.sqrt(forward_power))

    #scale = np.conj(alpha / forward_power) * fcen * 1j / np.sqrt(adjoint_power)
    scale = - alpha.conj() / (fcen) / 1j / np.sqrt(adjoint_power) / resolution ** 2
    a_Ex = sim.get_dft_array(design_flux,mp.Ex,0) #* scale 
    a_Ey = sim.get_dft_array(design_flux,mp.Ey,0) #* scale
    a_Ez = sim.get_dft_array(design_flux,mp.Ez,0) #* scale

    (x,y,z,w) = sim.get_array_metadata(dft_cell=design_flux)

    x = np.array(x)
    y = np.array(y)

    x = (x + 0.5/resolution)[:-1]
    y = (y + 0.5/resolution)[:-1]

    # Compute dF/deps integral
    grad = np.real( (d_Ez * a_Ez) * scale)

    # Compute dF/rho from dF/deps
    g_adjoint = basis.gradient(grad, x, y)

    sim.reset_meep()
    kpoint = mp.Vector3(1,0,0)
    sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
                        eig_band = 1,
                        direction=mp.NO_DIRECTION,
                        eig_kpoint=kpoint,
                        size = source_size,
                        center=source_center)]

    sim.change_sources(sources)
    #----------------------------------------------------------------------
    #- compute finite difference approximate
    #----------------------------------------------------------------------

    db = 1e-4
    n = Nx*Ny
    g_discrete = 0*np.ones((n,))

    idx = np.random.choice(n,16,replace=False)

    for k in idx:

        b0_0 = np.ones((n,))
        b0_1 = np.ones((n,))

        b0_0[:] = beta_vector
        b0_0[k] -= db
        sim.reset_meep()
        design_function.set_coefficients(b0_0)
        flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
        sim.run(until=time)
        fm, _ = cost_function(sim,flux)

        b0_1[:] = beta_vector
        b0_1[k] += db
        sim.reset_meep()
        design_function.set_coefficients(b0_1)
        flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
        sim.run(until=time)
        fp, _ = cost_function(sim,flux)

        g_discrete[k] = (fp - fm) / (2*db)

    print("Chosen indices: ",idx)
    print("adjoint method: {}".format(g_adjoint[idx]))
    print("discrete method: {}".format(g_discrete[idx]))
    print("ratio: {}".format(g_adjoint[idx]/g_discrete[idx]))

    min = np.min(g_discrete)
    max = np.max(g_discrete)

    plt.figure()
    plt.plot([min, max],[min, max],label='y=x comparison')
    plt.plot(g_discrete[idx],g_adjoint[idx],'o',label='Adjoint comparison')
    plt.xlabel('Finite Difference Gradient')
    plt.ylabel('Adjoint Gradient')
    plt.title('Resolution: {}'.format(resolution))
    plt.legend()
    plt.grid(True)

    plt.savefig('comparison_{}.png'.format(resolution))

plt.show()
stevengj commented 4 years ago

Good, it looks like it is just a straightforward normalization issue then.

stevengj commented 4 years ago

Note that for real functions f(z) of complex quantities, where you have to use the CR calculus, the quantities ∂f/∂z and ∂f/∂z̄ are complex conjugates of one another so you only need one. (Which one you use depends on how you use ∂f/∂z, e.g. how you implement the chain rule in your case — you just have to be self-consistent.)

Regarding level sets, the main disadvantage compared to a density-based approach is that computing derivatives is more complicated. This is especially true if the topology changes, but is even true for a given topology because of the complication that derivatives are messy at electromagnetic material interfaces in 3d. I don't think curvature constraints are especially easier to implement in a level-set approach either, because the trick is that you need to implement them in a differentiable way (and this eliminates many of the obvious approaches).

(The mode coefficient c itself is a holomorphic function of the solution fields, but in practice we need |c|² or similar real functions of c.)

smartalecH commented 4 years ago

After going through the math one more time, I think I've determined the correct scaling factors. Here are the results:

comparison_10

comparison_20

comparison_30

comparison_40

The slope of the regression line is about 1.05 (ideal being 1.0) for the resolution = 40. It does seem to be getting better with resolution.

stevengj commented 4 years ago

Would be nice to see if you keep doubling the resolution that it converges systematically to a slope of 1. (i.e. plot 1–slope vs resolution on a log–log scale).

To make things quicker, you could remove the PMLs in the y direction, and even make the cell boundaries adjacent to your design region in that direction. You could also try a 1d simulation.

smartalecH commented 4 years ago

Here's resolution = 80

comparison_80

And a summary of the results:

comp

The 160 and 320 are still running.

It seems like it is converging, but hard to tell from 4 data points.

I'm still searching for a hidden 2 somewhere to make this work...

In summary, the fields should be scaled as follows:

  1. Normalize out the mode envelope power using eig_power
  2. Multiple by the dV of the monitor (as described in Chris and Owen's paper) which is just dx in this probelem
  3. Multiply by j2πf0 (even though meep doesnt use 2π for materials, we still need it here, right?)
  4. Multiply by ∂f/∂E (pull the constants out since the resulting field functions correspond to a reverse mode source)

. Since f = |a|^2 = aā and a = 0.5 η (c1 - c2) (the sign here depends on your inner product convention) . ∂f/∂E = a ∂ā/∂E + ā ∂a/∂E but ā is independent of E so.... . ∂f/∂E = ā ∂a/∂E . ∂a/∂E = 0.5 η * dx

So final scale factor is c = 0.5 1/\sqrt(P) j 2 π f0 ā η dx * dx

And the resulting sensitivity ∂F/∂ϵ is 2 * Real(adjoint_field * original_field)

Questions: . The results above require an additional magic factor of 2, however, almost like I need to differentiate without CR calculus. Thoughts? . Is there a way to calculate the mode overlap scale factor η using the group velocity supplied by the get_eigenmode_coefficients routine? . I can exactly calculate η to machine precision but am unable to do so with the mode coefficients. Is this expected? I'm using an add_dft_fields object and would expect I'm pulling the same interpolated fields that the get_eigenmode_coefficients routine does?

stevengj commented 4 years ago

You want the derivative of f with respect to a real degree of freedom ξ. In general, my understanding is that you would have something like: df/dξ = ∂f/∂a (∂a/∂E ∂E/∂ξ + ∂a/∂Ē ∂Ē/∂ξ) + ∂f/∂ā (∂ā/∂E ∂E/∂ξ + ∂ā/∂Ē ∂Ē/∂ξ), but many of these terms vanish, so you are left with df/dξ = ∂f/∂a ∂a/∂E ∂E/∂ξ + ∂f/∂ā ∂ā/∂Ē ∂Ē/∂ξ = 2 Re[∂f/∂a ∂a/∂E ∂E/∂ξ] . There is your factor of 2.

In general, if you make everything real so that a = ā, CR calculus should simplify to ordinary calculus.

stevengj commented 4 years ago

Typing unicode in vscode: https://marketplace.visualstudio.com/items?itemName=oijaz.unicode-latex

smartalecH commented 4 years ago

Here's a tricky question:

Imagine our objective is a function of more than one eigenmode coefficients (i.e. a ratio of two coefficients). This would correspond to two separate adjoint sources. Each adjoint source corresponds to a scaled eigenmode source (in the opposite direction). The scaling factor of each adjoint source is proportional to the eigenmode coefficient, a frequency-dependent quantity.

This last point is important. With only one adjoint source, we could scale the sources by instead scaling the fields after the simulation (thanks to linearity). This allowed us to leverage broadband simulations. Once we introduce more than one source, however, there's no longer a common scalar. We could scale the adjoint sources upfront, but that limits us from doing broadband simulations.

Owen and I talked extensively about this issue in a more general sense -- i.e. how do you perform broadband simulations with Poynting flux monitors where the corresponding adjoint source is different for each frequency. He proposed several creative solutions. But I was hoping that since the adjoint source of an eigenmode monitor is just another eigenmode source, we could come up with something slightly easier to implement.

From a linear algebra perspective, the equivalent system is represented by Aᵀλ = g, where g is the adjoint source. We can express the adjoint source as a transformed linear combination of eigenmode sources with g = Bh, and B is a diagonal matrix with the corresponding scalar factors. Perhaps there's some simplifying identity for Aᵀλ = Bh...

I'll continue to look into it. But I appreciate any input. .

stevengj commented 4 years ago

Note that eigenmode sources are actually only narrowband anyway — they use a fixed mode profile given by the center frequency. See section 4.2.2 of our review article. In contrast, eigenmode coefficients recompute the mode profile at each frequency.

Since we are making a narrowband approximation anyway, one could argue that you might as well scale the adjoint sources upfront. Alternatively, you could precompute a time-series with the desired frequency dependence and use this as a custom src function (i.e. you handle dispersion in the mode coefficients, but not in the spatial mode profile).

smartalecH commented 4 years ago

Note that eigenmode sources are actually only narrowband anyway

Good point. How difficult would it be to allow the user to specify multiple frequency points over which we calculate the eigenmode profiles and interpolate in between? i.e. we launch a time and spatially varying profile? I'm sure the narrowband approximation is suitable for several applications. But I can think of a few (particularly within photonic integrated circuits) where bands are becoming rather large, to the point that modes differ quite a bit.

Alternatively, you could precompute a time-series with the desired frequency dependence and use this as a custom src function.

I think this is the approach many people follow. Owen mentioned he's tried fitting the coefficients to a Pade approximant in the frequency domain and then generated a time-domain apodization profile. He also said, however, that the resulting approximate is very sensitive to perturbations.

Given how the eigenmode source currently works, I'll opt to scale the sources by a single frequency. (unless anyone objects)

smartalecH commented 4 years ago

I've fixed the scaling factors (at least for eigenmode monitors) and tested various different scenarios with multiple monitors inside the objective function.

It all seems to work well and I will put together a meep PR once I figure out the last issue with filtering materials.

smartalecH commented 4 years ago

I'm having trouble getting the correct scaling for broadband simulations.

Using the same example as above, I've extended my objective function to simply sum over all of the requested frequency mode coefficients (alpha):

def J(alpha):
    np.sum(np.abs(alpha)**2)

The adjoint source is properly scaled by ∂J/∂alpha (thank you autograd).

When I calculate the gradients, I obviously get a ∂J/∂ξ for every grid point and every frequency point. Before I can calculate my sensitivity vector (∂J/∂ρ), I need to first make sure I only have ∂J/∂ξ for every grid point.

To do this, I go back to the definition of the adjoint: ∂J/∂ξ = λᵀ x (where Aξ is the gradient of the FDTD operator with respect to the permittivity). I assume that Aξ simply sums the frequency components at a particular grid point. If there were any susceptibilities, then it would correspond to a weighted sum (defined by the gradient of the susceptibility filter).

Is this correct? Or am I missing something?

stevengj commented 4 years ago

Don't you have to compute λᵀ * Aξ * x for each frequency and then sum? There is a different λ for each frequency, right? So this product is nonlinear and hence you can't interchange summation with *.

smartalecH commented 4 years ago

Yes, you're right: I have to compute ∂J/∂ξ for every frequency point via λᵀ x. Then I can sum.

The frequency mapping is basically an extension of the interpolation transformation B, where ξ(r,ω) = B ρ. As we discussed earlier, the chain rule jacobian needed to go from ∂J/∂ξ to ∂J/∂ρ is just the transformation B transposed. Which is why we would sum the frequency sensitivities for simple dispersionless examples.

Unfortunately, even after this fix, results are still significantly off. I'm going through the math again to see if I need to scale it by the frequencies I sum over (other than the expected j2πf scaling at each frequency point). The scale factor error seems to get dramatically worse as resolution increases -- this surprises me since I wouldn't expect the resolution to have any additional impact at this point. The code works great for single-frequency simulations, however.

I'm using the tests/simple.py script if anyone is interested.

Edit: Just for reference, the regression slope for the adjoint gradients vs the finite-difference gradients (I'll add more as they come):

An ideal simulation would obviously produce a regression slope of 1 (i.e. a perfect mapping between the adjoint gradient and the finite difference approximant).

smartalecH commented 4 years ago

The sum of the derivatives is the same as the derivative of the sum: I'll double-check the sensitivities at each frequency point.

smartalecH commented 4 years ago

Aha. My adjoint source was only being scaled by the magnitude of the adjoint source time envelope, rather than both the magnitude and the phase.

This wasn't a problem for the center frequency, which has a relative phase of zero. All of the other ∂J/∂ξ terms were improperly scaled. Obviously once summed, it produced a wrong result.

This issue actually introduces two important problems:

  1. Even for quasi-broadband (i.e. relatively narrowband) simulations, the adjoint source cannot simply be scaled by the center frequency's scaling factor: the source's time envelope (in addition to the other dispersive scaling terms) must be normalized out to produce accurate results.
  2. While we could "filter" the adjoint source in the time domain by the corresponding impulse response of the broadband scaling function, the need to normalize out the original source's time profile makes this challenging (if not impossible). The Gaussian profile is obviously needed to produce a smooth, broadband source.

It might be possible to scale out the time envelope after the simulation if all the adjoint sources have the same envelope (which seems like a reasonable constraint).