Open smartalecH opened 4 years ago
On step (8), you need the Jacobian matrix ∂ϵ/∂p. Since ε is related to p (for 1st-order elements) by a linear (affine) operation ε = Pp + constant then the Jacobian is just the matrix P from the projection/interpolation step, and you will essentially be multiplying ∂J/∂ϵ by the transpose of this.
There should be no Gram matrix (basis-function overlaps) or pseudo-inverses involved, as far as I can tell.
I've managed to write a function that produces the fenics interpolation matrix (which is the Jacobian matrix ∂ϵ/∂p). There are still several discrepancies between the adjoint gradient and numerical approximation.
I've found several other issues/bugs with the current codebase. Namely: 1.) Adjoint source components are incorrect 2.) Adjoint source scaling is incorrect 3.) Finite difference gradients are slightly non-deterministic (I'm assuming this is due to sympy numerical precision errors? Or maybe the fenics interpolation?) 4.) Final field integral to calculate gradient was wrong.
I'm planning on "manually" generating the adjoint gradients without any fancy fenics bases, sympy derivatives, etc. Here's a couple of questions:
1.) Should I still do the holey waveguide problem? We previously mentioned this wasn't a density optimization problem... but it should still be able to be formulated as one, right? 2.) Given meep's illusion of continuity, what's the best way to go from design parameters to a material function? Bilinear interpolation seems easiest and most practical, but the jacobian matrix seems a little hairy...
Bilinear interpolation is pretty easy and the Jacobian matrix is just the matrix of interpolation coefficients, so it shouldn't be hard to generate.
See the material_grid
type in MPB for example.
Before implementing the material_grid
feature in meep, I built a simple bilinear interpolator in python (and tested it rigorously). It produces a sparse matrix with the correct interpolating weights that can be used to transform the ∂F/∂ϵ to ∂F/∂p.
I "manually" coded all of the objective function gradients and adjoint sources using a simple waveguide problem:
My objective function was to maximize the first eigenmode coefficient of the right monitor. This means (according to the literature) that the adjoint source is a scaled eigenmode source at that monitor propagating inward.
I optimized over 100 different parameters (10 in the x-direction, 10 in the y-direction) and compared the adjoint gradient to the finite-difference gradient:
It's clear that there is certainly a linear correlation between the finite difference approximate and the adjoint, but some things still look off. Obviously, the field/source scaling needs fixing. But I'm wondering if there is something else? If it were just a linear scaling issue, then the two gradient vectors should be parallel (i.e. the points would lie on a straight line with a different slope).
Some issues that I've noticed:
Here's the script I used to generate the above results:
import numpy as np
import meep as mp
import meep_adjoint as mpa
import matplotlib.pyplot as plt
mp.quiet(quietval=True)
np.random.seed(64)
Sx = 6
Sy = 5
cell_size = mp.Vector3(Sx,Sy)
pml_layers = [mp.PML(1.0)]
resolution = 10
time = 1200
#----------------------------------------------------------------------
# Eigenmode source
#----------------------------------------------------------------------
fcen = 1/1.55
fwidth = 0.1 * fcen
source_center = [-1,0,0]
source_size = mp.Vector3(0,2,0)
kpoint = mp.Vector3(1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
eig_band = 1,
direction=mp.NO_DIRECTION,
eig_kpoint=kpoint,
size = source_size,
center=source_center)]
forward_power = sources[0].eig_power(fcen)
#----------------------------------------------------------------------
#- geometric objects
#----------------------------------------------------------------------
geometry = [mp.Block(center=mp.Vector3(), material=mp.Medium(index=3.45), size=mp.Vector3(mp.inf, 0.5, 0) )]
Nx = 10
Ny = 10
design_size = mp.Vector3(1, 1, 0)
design_region = mpa.Subregion(name='design', center=mp.Vector3(), size=design_size)
basis = mpa.BilinearInterpolationBasis(region=design_region,Nx=Nx,Ny=Ny)
beta_vector = 11*np.random.rand(Nx*Ny) + 1
design_function = basis.parameterized_function(beta_vector)
design_object = [mp.Block(center=design_region.center, size=design_region.size, epsilon_func = design_function.func())]
geometry += design_object
sim = mp.Simulation(cell_size=cell_size,
boundary_layers=pml_layers,
geometry=geometry,
sources=sources,
eps_averaging=False,
resolution=resolution)
#----------------------------------------------------------------------
#- add monitors
#----------------------------------------------------------------------
mon_vol = mp.Volume(center=mp.Vector3(1,0,0),size=mp.Vector3(y=2))
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
design_flux = sim.add_flux(fcen,0,1,mp.FluxRegion(direction=mp.X,center=design_region.center,size=design_region.size))
#----------------------------------------------------------------------
#- run forward simulation
#----------------------------------------------------------------------
sim.run(until=time)
#----------------------------------------------------------------------
#- objective function
#----------------------------------------------------------------------
# pull eigenmode data
def cost_function(sim,f_vol):
mode = 1
EigenmodeData = sim.get_eigenmode(fcen, mp.X, f_vol.where, mode, mp.Vector3())
x,y,z,w = sim.get_array_metadata(dft_cell=f_vol)
x = np.array(x)
y = np.array(y)
Emy = np.zeros((x.size,y.size),dtype=np.complex128)
Emz = np.zeros((x.size,y.size),dtype=np.complex128)
Hmy = np.zeros((x.size,y.size),dtype=np.complex128)
Hmz = np.zeros((x.size,y.size),dtype=np.complex128)
Ey = np.zeros((x.size,y.size),dtype=np.complex128)
Ez = np.zeros((x.size,y.size),dtype=np.complex128)
Hy = np.zeros((x.size,y.size),dtype=np.complex128)
Hz = np.zeros((x.size,y.size),dtype=np.complex128)
mcomps = [Emy,Emz,Hmy,Hmz]
sim_comps = [Ey,Ez,Hy,Hz]
for ic,c in enumerate([mp.Ey,mp.Ez,mp.Hy,mp.Ez]):
sim_comps[ic][:] = sim.get_dft_array(flux,c,0)
for ix,px in enumerate(x):
for iy,py in enumerate(y):
mcomps[ic][ix,iy] = EigenmodeData.amplitude(mp.Vector3(px,py),c)
ob = sim.get_eigenmode_coefficients(f_vol,[1])
coeff = ob.alpha[0,0,0]
# Hm* cross E
C1 = np.sum(np.conj(Hmy) * Ez - np.conj(Hmz) * Ey,axis=None) * (1/resolution) ** 2
# Em* cross H
C2 = np.sum(np.conj(Emy) * Hz - np.conj(Emz) * Hy,axis=None) * (1/resolution) ** 2
# Hm* cross Em
Nm = -np.sum(np.conj(Hmy) * Emz - np.conj(Hmz) * Emy,axis=None) * (1/resolution) ** 2
# H* cross E
Pin = -np.sum(np.conj(Hy) * Ez - np.conj(Hz) * Ey,axis=None) * (1/resolution) ** 2
#alpha = 0.5 * (C2 / np.conj(Nm) - C1 / Nm)
alpha = C2 - C1
f = np.abs(coeff)**2 #1/8*np.abs(alpha) ** 2 / Nm
A = coeff#1/4*np.conj(alpha) / Nm
print(A)
return f, A
f0, alpha = cost_function(sim,flux)
# record design cell fields
d_Ex = sim.get_dft_array(design_flux,mp.Ex,0)
d_Ey = sim.get_dft_array(design_flux,mp.Ey,0)
d_Ez = sim.get_dft_array(design_flux,mp.Ez,0)
#----------------------------------------------------------------------
#- add adjoint sources
#----------------------------------------------------------------------
#scalar = -2 / (1j*fcen*np.abs(Nm)**2) * 1/resolution * np.conjugate(alpha)
sim.reset_meep()
# update sources
kpoint = mp.Vector3(-1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
eig_band = 1,
direction=mp.NO_DIRECTION,
eig_kpoint=kpoint,
size = mon_vol.size,
center=mon_vol.center)]
sim.change_sources(sources)
adjoint_power = sources[0].eig_power(fcen)
#----------------------------------------------------------------------
#- run adjoint simulation
#----------------------------------------------------------------------
design_flux = sim.add_flux(fcen,0,1,mp.FluxRegion(direction=mp.X,center=design_region.center,size=design_region.size))
sim.run(until=time)
#----------------------------------------------------------------------
#- compute gradient
#----------------------------------------------------------------------
#scale = 1j * np.conj(alpha) / (np.sqrt(adjoint_power) * np.sqrt(forward_power))
#scale = np.conj(alpha / forward_power) * fcen * 1j / np.sqrt(adjoint_power)
scale = alpha.conj() * fcen * 1j / np.sqrt(adjoint_power)
a_Ex = sim.get_dft_array(design_flux,mp.Ex,0) #* scale
a_Ey = sim.get_dft_array(design_flux,mp.Ey,0) #* scale
a_Ez = sim.get_dft_array(design_flux,mp.Ez,0) #* scale
(x,y,z,w) = sim.get_array_metadata(dft_cell=design_flux)
grid = mpa.xyzw2grid((x,y,z,w))
# Compute dF/deps integral
grad = np.real(2 * (d_Ex * a_Ex + d_Ey * a_Ey + d_Ez * a_Ez) * scale * (1/resolution ** 2))
# Compute dF/rho from dF/deps
g_adjoint = basis.gradient(grad, grid)
'''plt.figure()
plt.imshow(grad)
plt.figure()
print(grad)
plt.imshow(g_adjoint.reshape((Nx,Ny),order='C'))
plt.show()
quit()'''
sim.reset_meep()
kpoint = mp.Vector3(1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
eig_band = 1,
direction=mp.NO_DIRECTION,
eig_kpoint=kpoint,
size = source_size,
center=source_center)]
sim.change_sources(sources)
#----------------------------------------------------------------------
#- compute finite difference approximate
#----------------------------------------------------------------------
db = 1e-4
n = Nx*Ny
g_discrete = 0*np.ones((n,))
idx = np.random.choice(n,99,replace=False)
for k in idx:
b0_0 = np.ones((n,))
b0_1 = np.ones((n,))
b0_0[:] = beta_vector
b0_0[k] -= db
sim.reset_meep()
design_function.set_coefficients(b0_0)
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
sim.run(until=time)
fm, _ = cost_function(sim,flux)
b0_1[:] = beta_vector
b0_1[k] += db
sim.reset_meep()
design_function.set_coefficients(b0_1)
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
sim.run(until=time)
fp, _ = cost_function(sim,flux)
g_discrete[k] = (fp - fm) / (2*db)
print("Chosen indices: ",idx)
print("adjoint method: {}".format(g_adjoint[idx]))
print("discrete method: {}".format(g_discrete[idx]))
print("ratio: {}".format(g_adjoint[idx]/g_discrete[idx]))
min = np.min(g_discrete)
max = np.max(g_discrete)
plt.figure()
plt.plot([min, max],[min, max],label='y=x comparison')
plt.plot(g_discrete[idx],g_adjoint[idx],'o',label='Adjoint comparison')
plt.xlabel('Finite Difference Gradient')
plt.ylabel('Adjoint Gradient')
plt.legend()
plt.grid(True)
plt.savefig('comparison.png')
plt.show()
One way to obtain the Yee field points is to use the field function via output_field_function
. However, we will first need to add the equivalent of get_field_point
for DFT fields (which does not correctly exist) to obtain the non-interpolated fields which are then used to compute the adjoints. This requires setting the use_centered_grid
parameter of the fields::add_dft
constructor to False
.
I would double-check whether
makes the adjoint vs. finite-difference gradient plot closer to a straight line (with any slope — the slope just means you are off by a normalization factor, which I'm not so worried about).
Any issue with the Yee grid etcetera should disappear with increasing resolution. Any issue (variation from a straight line) that doesn't disappear with increasing resolution is something more fundamental.
The recent update to add_dft_fields
seems to help (i.e. the lines are straighter). I simulated a few different resolutions and the results improve, as we discussed. The scale factor is still off, but that's a matter of trial and error now.
Currently, I'm only use the Ez
component of the Fourier transformed fields to calculate the gradient. The Ex
and Ey
components are orders of magnitude smaller. This makes me think that the gradient of the FDTD matrix with respect to the permittivity (A_p) is slightly more complicated than we discussed earlier. That being said, this seems like a good enough approximation for now (or it will be once the field/source scaling is fixed).
I'll post the code below:
import numpy as np
import meep as mp
import matplotlib.pyplot as plt
from scipy import sparse
mp.quiet(quietval=True)
# -------------------------------- #
# Bilinear Interpolation Basis class
# -------------------------------- #
class BilinearInterpolationBasis():
'''
Simple bilinear interpolation basis set.
'''
def __init__(self,region,Nx,Ny):
''' '''
self.region = region
self.Nx = Nx
self.Ny = Ny
self.dim = 2
# Generate interpolation grid
self.rho_x = np.linspace(region.center.x - region.size.x/2,region.center.x + region.size.x/2,Nx)
self.rho_y = np.linspace(region.center.y - region.size.y/2,region.center.y + region.size.y/2,Ny)
def gradient(self,eps,xtics,ytics):
# get array of grid points that correspond to epsilon vector
eps = eps.reshape(eps.size,order='C')
A = gen_interpolation_matrix(self.rho_x,self.rho_y,xtics,ytics)
return (eps.T * A).T
def parameterized_function(self, beta_vector):
"""
Construct and return a callable, updatable element of the function space.
After the line
func = basis.parameterized_function(beta_vector)
we have:
1. func is a callable scalar function of one spatial variable:
func(p) = f_0 + \sum_n beta_n * b_n(p)
2. func has a set_coefficients method for updating the expansion coefficients
func.set_coefficients(new_beta_vector)
"""
class _ParameterizedFunction(object):
def __init__(self, basis, beta_vector):
'''
The user supplied beta vector should be a collapsed column vector of the
design parameters.
'''
self.rho_x = basis.rho_x
self.rho_y = basis.rho_y
self.beta = beta_vector
def set_coefficients(self, beta_vector):
self.beta = beta_vector
def __call__(self, p):
weights, interp_idx = get_bilinear_row(p.x,p.y,self.rho_x,self.rho_y)
return np.dot( self.beta[interp_idx], weights )
def func(self):
def _f(p):
return self(p)
return _f
return _ParameterizedFunction(self, beta_vector)
# -------------------------------- #
# Helper functions
# -------------------------------- #
def get_bilinear_coefficients(x,x1,x2,y,y1,y2):
'''
Calculates the bilinear interpolation coefficients for a single point at (x,y).
Assumes that the user already knows the four closest points and provides the corresponding
(x1,x2) and(y1,y2) coordinates.
'''
b11 = ((x - x2)*(y - y2))/((x1 - x2)*(y1 - y2))
b12 = -((x - x2)*(y - y1))/((x1 - x2)*(y1 - y2))
b21 = -((x - x1)*(y - y2))/((x1 - x2)*(y1 - y2))
b22 = ((x - x1)*(y - y1))/((x1 - x2)*(y1 - y2))
return [b11,b12,b21,b22]
def get_bilinear_row(rx,ry,rho_x,rho_y):
'''
Calculates a vector of bilinear interpolation weights that can be used
in an inner product with the neighboring function values, or placed
inside of an interpolation matrix.
'''
Nx = rho_x.size
Ny = rho_y.size
# binary search in x direction to get x1 and x2
xi2 = np.searchsorted(rho_x,rx,side='left')
if xi2 <= 0: # extrapolation (be careful!)
xi1 = 0
xi2 = 1
elif xi2 >= Nx-1: # extrapolation (be careful!)
xi1 = Nx-2
xi2 = Nx-1
else:
xi1 = xi2 - 1
x1 = rho_x[xi1]
x2 = rho_x[xi2]
# binary search in y direction to get y1 and y2
yi2 = np.searchsorted(rho_y,ry,side='left')
if yi2 <= 0: # extrapolation (be careful!)
yi1 = 0
yi2 = 1
elif yi2 >= Ny-1: # extrapolation (be careful!)
yi1 = Ny-2
yi2 = Ny-1
else:
yi1 = yi2 - 1
y1 = rho_y[yi1]
y2 = rho_y[yi2]
# get weights
weights = get_bilinear_coefficients(rx,x1,x2,ry,y1,y2)
# get location of nearest neigbor interpolation points
interp_idx = np.array([xi1*Nx+yi1,xi1*Nx+yi2,(xi2)*Nx+yi1,(xi2)*Nx+yi2],dtype=np.int64)
return weights, interp_idx
def gen_interpolation_matrix(rho_x,rho_y,rho_x_interp,rho_y_interp):
'''
Generates a bilinear interpolation matrix.
Arguments:
rho_x ................ [N,] numpy array - original x array mapping to povided data
rho_y ................ [N,] numpy array - original y array mapping to povided data
rho_x_interp ......... [N,] numpy array - new x array mapping to desired interpolated data
rho_y_interp ......... [N,] numpy array - new y array mapping to desired interpolated data
Returns:
A .................... [N,M] sparse matrix - interpolation matrix
'''
Nx = rho_x.size
Ny = rho_y.size
Nx_interp = np.array(rho_x_interp).size
Ny_interp = np.array(rho_y_interp).size
input_dimension = Nx * Ny
output_dimension = Nx_interp * Ny_interp
interp_weights = np.zeros(4*output_dimension)
row_ind = np.zeros(4*output_dimension,dtype=np.int64)
col_ind = np.zeros(4*output_dimension,dtype=np.int64)
ri = 0
for rx in rho_x_interp:
for ry in rho_y_interp:
# get weights
weights, interp_idx = get_bilinear_row(rx,ry,rho_x,rho_y)
# populate sparse matrix vectors
interp_weights[4*ri:4*(ri+1)] = weights
row_ind[4*ri:4*(ri+1)] = np.array([ri,ri,ri,ri],dtype=np.int64)
col_ind[4*ri:4*(ri+1)] = interp_idx
ri += 1
# From matrix vectors, populate the sparse matrix
A = sparse.coo_matrix((interp_weights, (row_ind, col_ind)),shape=(output_dimension, input_dimension))
return A
#----------------------------------------------------------------------
# Main routine enters here
#----------------------------------------------------------------------
for resolution in [10]:
np.random.seed(64)
Sx = 6
Sy = 5
cell_size = mp.Vector3(Sx,Sy)
pml_layers = [mp.PML(1.0)]
time = 1200
#----------------------------------------------------------------------
# Eigenmode source
#----------------------------------------------------------------------
fcen = 1/1.55
fwidth = 0.1 * fcen
source_center = [-1,0,0]
source_size = mp.Vector3(0,2,0)
kpoint = mp.Vector3(1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
eig_band = 1,
direction=mp.NO_DIRECTION,
eig_kpoint=kpoint,
size = source_size,
center=source_center)]
forward_power = sources[0].eig_power(fcen)
#----------------------------------------------------------------------
#- geometric objects
#----------------------------------------------------------------------
geometry = [mp.Block(center=mp.Vector3(), material=mp.Medium(index=3.45), size=mp.Vector3(mp.inf, 0.5, 0) )]
Nx = 4
Ny = 4
design_size = mp.Vector3(1, 1, 0)
design_region = mp.Volume(center=mp.Vector3(), size=design_size)
basis = BilinearInterpolationBasis(region=design_region,Nx=Nx,Ny=Ny)
beta_vector = 11*np.random.rand(Nx*Ny) + 1
design_function = basis.parameterized_function(beta_vector)
design_object = [mp.Block(center=design_region.center, size=design_region.size, epsilon_func = design_function.func())]
geometry += design_object
sim = mp.Simulation(cell_size=cell_size,
boundary_layers=pml_layers,
geometry=geometry,
sources=sources,
eps_averaging=False,
resolution=resolution)
#----------------------------------------------------------------------
#- add monitors
#----------------------------------------------------------------------
mon_vol = mp.Volume(center=mp.Vector3(1,0,0),size=mp.Vector3(y=2))
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
design_flux = sim.add_dft_fields([mp.Ex,mp.Ey,mp.Ez],fcen,fcen,1,where=design_region,yee_grid=True)
#----------------------------------------------------------------------
#- run forward simulation
#----------------------------------------------------------------------
sim.run(until=time)
#----------------------------------------------------------------------
#- objective function
#----------------------------------------------------------------------
# pull eigenmode data
def cost_function(sim,f_vol):
mode = 1
EigenmodeData = sim.get_eigenmode(fcen, mp.X, f_vol.where, mode, mp.Vector3())
x,y,z,w = sim.get_array_metadata(dft_cell=f_vol)
x = np.array(x)
y = np.array(y)
Emy = np.zeros((x.size,y.size),dtype=np.complex128)
Emz = np.zeros((x.size,y.size),dtype=np.complex128)
Hmy = np.zeros((x.size,y.size),dtype=np.complex128)
Hmz = np.zeros((x.size,y.size),dtype=np.complex128)
Ey = np.zeros((x.size,y.size),dtype=np.complex128)
Ez = np.zeros((x.size,y.size),dtype=np.complex128)
Hy = np.zeros((x.size,y.size),dtype=np.complex128)
Hz = np.zeros((x.size,y.size),dtype=np.complex128)
mcomps = [Emy,Emz,Hmy,Hmz]
sim_comps = [Ey,Ez,Hy,Hz]
for ic,c in enumerate([mp.Ey,mp.Ez,mp.Hy,mp.Ez]):
sim_comps[ic][:] = sim.get_dft_array(flux,c,0)
for ix,px in enumerate(x):
for iy,py in enumerate(y):
mcomps[ic][ix,iy] = EigenmodeData.amplitude(mp.Vector3(px,py),c)
ob = sim.get_eigenmode_coefficients(f_vol,[1])
coeff = ob.alpha[0,0,0]
# Hm* cross E
C1 = np.sum(np.conj(Hmy) * Ez - np.conj(Hmz) * Ey,axis=None) * (1/resolution) ** 2
# Em* cross H
C2 = np.sum(np.conj(Emy) * Hz - np.conj(Emz) * Hy,axis=None) * (1/resolution) ** 2
# Hm* cross Em
Nm = -np.sum(np.conj(Hmy) * Emz - np.conj(Hmz) * Emy,axis=None) * (1/resolution) ** 2
# H* cross E
Pin = -np.sum(np.conj(Hy) * Ez - np.conj(Hz) * Ey,axis=None) * (1/resolution) ** 2
#alpha = 0.5 * (C2 / np.conj(Nm) - C1 / Nm)
alpha = C2 - C1
f = 1/8*np.abs(alpha) ** 2 / Nm
A = 1/4*np.conj(alpha) / Nm
f = np.abs(coeff)**2 #1/8*np.abs(alpha) ** 2 / Nm
A = coeff#1/4*np.conj(alpha) / Nm
print(A)
return f, A
f0, alpha = cost_function(sim,flux)
# record design cell fields
d_Ex = sim.get_dft_array(design_flux,mp.Ex,0)
d_Ey = sim.get_dft_array(design_flux,mp.Ey,0)
d_Ez = sim.get_dft_array(design_flux,mp.Ez,0)
#----------------------------------------------------------------------
#- add adjoint sources
#----------------------------------------------------------------------
sim.reset_meep()
# update sources
kpoint = mp.Vector3(-1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
eig_band = 1,
direction=mp.NO_DIRECTION,
eig_kpoint=kpoint,
size = mon_vol.size,
center=mon_vol.center)]
sim.change_sources(sources)
adjoint_power = sources[0].eig_power(fcen)
#----------------------------------------------------------------------
#- run adjoint simulation
#----------------------------------------------------------------------
design_flux = sim.add_dft_fields([mp.Ex,mp.Ey,mp.Ez],fcen,fcen,1,where=design_region,yee_grid=True)
sim.run(until=time)
#----------------------------------------------------------------------
#- compute gradient
#----------------------------------------------------------------------
#scale = 1j * np.conj(alpha) / (np.sqrt(adjoint_power) * np.sqrt(forward_power))
#scale = np.conj(alpha / forward_power) * fcen * 1j / np.sqrt(adjoint_power)
scale = - alpha.conj() / (fcen) / 1j / np.sqrt(adjoint_power) / resolution ** 2
a_Ex = sim.get_dft_array(design_flux,mp.Ex,0) #* scale
a_Ey = sim.get_dft_array(design_flux,mp.Ey,0) #* scale
a_Ez = sim.get_dft_array(design_flux,mp.Ez,0) #* scale
(x,y,z,w) = sim.get_array_metadata(dft_cell=design_flux)
x = np.array(x)
y = np.array(y)
x = (x + 0.5/resolution)[:-1]
y = (y + 0.5/resolution)[:-1]
# Compute dF/deps integral
grad = np.real( (d_Ez * a_Ez) * scale)
# Compute dF/rho from dF/deps
g_adjoint = basis.gradient(grad, x, y)
sim.reset_meep()
kpoint = mp.Vector3(1,0,0)
sources = [mp.EigenModeSource(mp.GaussianSource(frequency=fcen,fwidth=fwidth),
eig_band = 1,
direction=mp.NO_DIRECTION,
eig_kpoint=kpoint,
size = source_size,
center=source_center)]
sim.change_sources(sources)
#----------------------------------------------------------------------
#- compute finite difference approximate
#----------------------------------------------------------------------
db = 1e-4
n = Nx*Ny
g_discrete = 0*np.ones((n,))
idx = np.random.choice(n,16,replace=False)
for k in idx:
b0_0 = np.ones((n,))
b0_1 = np.ones((n,))
b0_0[:] = beta_vector
b0_0[k] -= db
sim.reset_meep()
design_function.set_coefficients(b0_0)
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
sim.run(until=time)
fm, _ = cost_function(sim,flux)
b0_1[:] = beta_vector
b0_1[k] += db
sim.reset_meep()
design_function.set_coefficients(b0_1)
flux = sim.add_flux(fcen,0,1,mp.FluxRegion(center=mon_vol.center,size=mon_vol.size))
sim.run(until=time)
fp, _ = cost_function(sim,flux)
g_discrete[k] = (fp - fm) / (2*db)
print("Chosen indices: ",idx)
print("adjoint method: {}".format(g_adjoint[idx]))
print("discrete method: {}".format(g_discrete[idx]))
print("ratio: {}".format(g_adjoint[idx]/g_discrete[idx]))
min = np.min(g_discrete)
max = np.max(g_discrete)
plt.figure()
plt.plot([min, max],[min, max],label='y=x comparison')
plt.plot(g_discrete[idx],g_adjoint[idx],'o',label='Adjoint comparison')
plt.xlabel('Finite Difference Gradient')
plt.ylabel('Adjoint Gradient')
plt.title('Resolution: {}'.format(resolution))
plt.legend()
plt.grid(True)
plt.savefig('comparison_{}.png'.format(resolution))
plt.show()
Good, it looks like it is just a straightforward normalization issue then.
Note that for real functions f(z) of complex quantities, where you have to use the CR calculus, the quantities ∂f/∂z and ∂f/∂z̄ are complex conjugates of one another so you only need one. (Which one you use depends on how you use ∂f/∂z, e.g. how you implement the chain rule in your case — you just have to be self-consistent.)
Regarding level sets, the main disadvantage compared to a density-based approach is that computing derivatives is more complicated. This is especially true if the topology changes, but is even true for a given topology because of the complication that derivatives are messy at electromagnetic material interfaces in 3d. I don't think curvature constraints are especially easier to implement in a level-set approach either, because the trick is that you need to implement them in a differentiable way (and this eliminates many of the obvious approaches).
(The mode coefficient c itself is a holomorphic function of the solution fields, but in practice we need |c|² or similar real functions of c.)
After going through the math one more time, I think I've determined the correct scaling factors. Here are the results:
The slope of the regression line is about 1.05 (ideal being 1.0) for the resolution = 40. It does seem to be getting better with resolution.
Would be nice to see if you keep doubling the resolution that it converges systematically to a slope of 1. (i.e. plot 1–slope vs resolution on a log–log scale).
To make things quicker, you could remove the PMLs in the y direction, and even make the cell boundaries adjacent to your design region in that direction. You could also try a 1d simulation.
Here's resolution = 80
And a summary of the results:
The 160 and 320 are still running.
It seems like it is converging, but hard to tell from 4 data points.
I'm still searching for a hidden 2 somewhere to make this work...
In summary, the fields should be scaled as follows:
eig_power
. Since f = |a|^2 = aā and a = 0.5 η (c1 - c2) (the sign here depends on your inner product convention) . ∂f/∂E = a ∂ā/∂E + ā ∂a/∂E but ā is independent of E so.... . ∂f/∂E = ā ∂a/∂E . ∂a/∂E = 0.5 η * dx
So final scale factor is c = 0.5 1/\sqrt(P) j 2 π f0 ā η dx * dx
And the resulting sensitivity ∂F/∂ϵ is 2 * Real(adjoint_field * original_field)
Questions:
. The results above require an additional magic factor of 2, however, almost like I need to differentiate without CR calculus. Thoughts?
. Is there a way to calculate the mode overlap scale factor η using the group velocity supplied by the get_eigenmode_coefficients
routine?
. I can exactly calculate η to machine precision but am unable to do so with the mode coefficients. Is this expected? I'm using an add_dft_fields
object and would expect I'm pulling the same interpolated fields that the get_eigenmode_coefficients
routine does?
You want the derivative of f with respect to a real degree of freedom ξ. In general, my understanding is that you would have something like: df/dξ = ∂f/∂a (∂a/∂E ∂E/∂ξ + ∂a/∂Ē ∂Ē/∂ξ) + ∂f/∂ā (∂ā/∂E ∂E/∂ξ + ∂ā/∂Ē ∂Ē/∂ξ), but many of these terms vanish, so you are left with df/dξ = ∂f/∂a ∂a/∂E ∂E/∂ξ + ∂f/∂ā ∂ā/∂Ē ∂Ē/∂ξ = 2 Re[∂f/∂a ∂a/∂E ∂E/∂ξ] . There is your factor of 2.
In general, if you make everything real so that a = ā, CR calculus should simplify to ordinary calculus.
Typing unicode in vscode: https://marketplace.visualstudio.com/items?itemName=oijaz.unicode-latex
Here's a tricky question:
Imagine our objective is a function of more than one eigenmode coefficients (i.e. a ratio of two coefficients). This would correspond to two separate adjoint sources. Each adjoint source corresponds to a scaled eigenmode source (in the opposite direction). The scaling factor of each adjoint source is proportional to the eigenmode coefficient, a frequency-dependent quantity.
This last point is important. With only one adjoint source, we could scale the sources by instead scaling the fields after the simulation (thanks to linearity). This allowed us to leverage broadband simulations. Once we introduce more than one source, however, there's no longer a common scalar. We could scale the adjoint sources upfront, but that limits us from doing broadband simulations.
Owen and I talked extensively about this issue in a more general sense -- i.e. how do you perform broadband simulations with Poynting flux monitors where the corresponding adjoint source is different for each frequency. He proposed several creative solutions. But I was hoping that since the adjoint source of an eigenmode monitor is just another eigenmode source, we could come up with something slightly easier to implement.
From a linear algebra perspective, the equivalent system is represented by Aᵀλ = g, where g is the adjoint source. We can express the adjoint source as a transformed linear combination of eigenmode sources with g = Bh, and B is a diagonal matrix with the corresponding scalar factors. Perhaps there's some simplifying identity for Aᵀλ = Bh...
I'll continue to look into it. But I appreciate any input. .
Note that eigenmode sources are actually only narrowband anyway — they use a fixed mode profile given by the center frequency. See section 4.2.2 of our review article. In contrast, eigenmode coefficients recompute the mode profile at each frequency.
Since we are making a narrowband approximation anyway, one could argue that you might as well scale the adjoint sources upfront. Alternatively, you could precompute a time-series with the desired frequency dependence and use this as a custom src function (i.e. you handle dispersion in the mode coefficients, but not in the spatial mode profile).
Note that eigenmode sources are actually only narrowband anyway
Good point. How difficult would it be to allow the user to specify multiple frequency points over which we calculate the eigenmode profiles and interpolate in between? i.e. we launch a time and spatially varying profile? I'm sure the narrowband approximation is suitable for several applications. But I can think of a few (particularly within photonic integrated circuits) where bands are becoming rather large, to the point that modes differ quite a bit.
Alternatively, you could precompute a time-series with the desired frequency dependence and use this as a custom src function.
I think this is the approach many people follow. Owen mentioned he's tried fitting the coefficients to a Pade approximant in the frequency domain and then generated a time-domain apodization profile. He also said, however, that the resulting approximate is very sensitive to perturbations.
Given how the eigenmode source currently works, I'll opt to scale the sources by a single frequency. (unless anyone objects)
I've fixed the scaling factors (at least for eigenmode monitors) and tested various different scenarios with multiple monitors inside the objective function.
It all seems to work well and I will put together a meep PR once I figure out the last issue with filtering materials.
I'm having trouble getting the correct scaling for broadband simulations.
Using the same example as above, I've extended my objective function to simply sum over all of the requested frequency mode coefficients (alpha):
def J(alpha):
np.sum(np.abs(alpha)**2)
The adjoint source is properly scaled by ∂J/∂alpha (thank you autograd).
When I calculate the gradients, I obviously get a ∂J/∂ξ for every grid point and every frequency point. Before I can calculate my sensitivity vector (∂J/∂ρ), I need to first make sure I only have ∂J/∂ξ for every grid point.
To do this, I go back to the definition of the adjoint: ∂J/∂ξ = λᵀ Aξ x (where Aξ is the gradient of the FDTD operator with respect to the permittivity). I assume that Aξ simply sums the frequency components at a particular grid point. If there were any susceptibilities, then it would correspond to a weighted sum (defined by the gradient of the susceptibility filter).
Is this correct? Or am I missing something?
Don't you have to compute λᵀ * Aξ * x
for each frequency and then sum? There is a different λ
for each frequency, right? So this product is nonlinear and hence you can't interchange summation with *
.
Yes, you're right: I have to compute ∂J/∂ξ for every frequency point via λᵀ Aξ x. Then I can sum.
The frequency mapping is basically an extension of the interpolation transformation B, where ξ(r,ω) = B ρ. As we discussed earlier, the chain rule jacobian needed to go from ∂J/∂ξ to ∂J/∂ρ is just the transformation B transposed. Which is why we would sum the frequency sensitivities for simple dispersionless examples.
Unfortunately, even after this fix, results are still significantly off. I'm going through the math again to see if I need to scale it by the frequencies I sum over (other than the expected j2πf scaling at each frequency point). The scale factor error seems to get dramatically worse as resolution increases -- this surprises me since I wouldn't expect the resolution to have any additional impact at this point. The code works great for single-frequency simulations, however.
I'm using the tests/simple.py
script if anyone is interested.
Edit: Just for reference, the regression slope for the adjoint gradients vs the finite-difference gradients (I'll add more as they come):
An ideal simulation would obviously produce a regression slope of 1 (i.e. a perfect mapping between the adjoint gradient and the finite difference approximant).
The sum of the derivatives is the same as the derivative of the sum: I'll double-check the sensitivities at each frequency point.
Aha. My adjoint source was only being scaled by the magnitude of the adjoint source time envelope, rather than both the magnitude and the phase.
This wasn't a problem for the center frequency, which has a relative phase of zero. All of the other ∂J/∂ξ terms were improperly scaled. Obviously once summed, it produced a wrong result.
This issue actually introduces two important problems:
It might be possible to scale out the time envelope after the simulation if all the adjoint sources have the same envelope (which seems like a reasonable constraint).
This is a simplified version of Homer's adjoint variable method. Homer's current version has several great features, but the core functionality still seems slightly buggy. Specifically, the adjoint gradients don't match the finite difference approximates. I've stripped the extra features, and am focusing on objective functions that only have eigenmode coefficient features (q). A test script for the current version is
examples/check_gradients.py
, which simulates a simple waveguide crossing with the following geometry:The design region is the center square (with both blue and green hashes). It's parameterized using a first-degree continuous Galerkin finite element basis (implemented in fenics). It uses just two triangles, which corresponds to just 4 nodes/design variables. This makes calculating the finite-difference gradient relatively easy (compared to having hundreds of partials to choose from).
There is an eigenmode source (red line) and various eigenmode monitors to optimize over. The script currently optimizes abs(P1_north)**2 where P1_north corresponds to the forward propagating mode coefficient from the north monitor.
Unfortunately, the adjoint gradient still doesn't match the finite difference approximate. I'm assuming the issue has to do with the source/fields scaling.
I've slightly modified Homer's implementation. The resulting procedure is as follows:
I think the issue is with the scaling of the adjoint sources (or fields, depending on which you want to do). I'm scaling the adjoint source with the following:
scale factor = - ∂J/∂q / (2.0*ω).
(from Homer's code)where ω is meep's ω (i.e. no 2π).
Potential values that need correction/normalization:
Any ideas?
@oskooi @stevengj @HomerReid @danielwboyce