Closed RichardWaiteSTFC closed 4 months ago
UB matrices (saved as .mat) - can be loaded onto a workspace by using LoadISawUB
algorithm
UBs_for_Waruna.zip
Additional files to use as background: Vanadium runs (no powder lines) taken with same resolution settings as single-crystal, but they have unusually good stats
Powder runs counted for ~10 muAh, taken with different resolution (HR-SF) compared typical single-crystal experiments (MR-SX). But should still be worth using.
You can ask WISH team for more runs if you need them - but you can create 1000s of simulations just using the run you have and changing goniometer rotations, UB matrices random seeds, reflection condition/centering as described in the issue
To label the data sets you have 2-3 options:
findSpectrumIndex
and createPeaksWorkspaceFromIndices
in the script I mentioned aboveto go the opposite way (i.e. take information in the peak table - namely DetID and TOF) and convert it to indices in the array. I.e. convert the detector ID to a spectrum index and convert this index to the array indices in the first two dimensions (see findSpectrumIndex
) and then use the TOF to get the index along the third dimension.
To get the simulated data only with 0 background, replace this line
ws.dataY(ispec)[istart:iend] += np.random.poisson(lam=func_callable(dspac))
with something like
y = np.zeros(ws.dataY(ispec).shape)
y[istart:iend] += np.random.poisson(lam=func_callable(dspac))
ws.setY(ispec, y)
be careful to set the random seed as this might give different answers, or have two workspace (background and non-background) that you add the same simulated data to.
You would need to train the CNN on the simulation with the background, but you could get your labels from the simulation without the background.
scipy.ndimage.label
and 'scipy.ndimage.maximum_position`If you want to try this on real data to find peaks, use runs 42730-42733
I have adjusted the code to generate the labels at the point the peaks is simulated. It produces a csv file with each row corresponding to a row in the peak table (it contains spectrum index, index of TOF bin/channel for peak centre, min and max). This should be sufficient to train the algorithm to detect peaks in 2D detector images at constant TOF as discussed.
In addition I call Rebunch
on the data that increases the bin width by a factor 3 (i.e. there are only 1540 TOF bins) - we should still have multiple TOF bins per peak.
If you want to further reduce the data size you can treat each half of the WISH instrument separately (i.e. panels 1-5 and panels 6-10).
# import mantid algorithms, numpy and matplotlib
from mantid.simpleapi import *
import matplotlib.pyplot as plt
import numpy as np
from IntegratePeaksSkew import InstrumentArrayConverter, PeakData
from mantid.api import FunctionFactory
# powder taken with MR-SF settings (same typically used for single-crystal)
ws = Load(Filename='/archive/ndxwish/Instrument/data/cycle_23_4/WISH00056081.raw', OutputWorkspace='WISH00056081')
# reduce size of data and improve stats - increase TOF bin-width by factor 3 (typically fine for peak finding on WISH)
ws = Rebunch(InputWorkspace=ws, NBunch=3, OutputWorkspace=ws.name())
ws = ConvertUnits(InputWorkspace=ws, OutputWorkspace=ws.name(), Target='dSpacing')
# set a UB matrix (taken from PdCrO2 - could use any!)
ub = np.array([[ 0.08749108, 0.37636055, -0.00169743],
[-0.01788245, 0.00531973, 0.05517389],
[ 0.383845 , 0.11677283, 0.00295732]])
SetUB(ws, UB=ub)
# set goniometer (roatets UB matrix by 5 degrees around vertical axis - can add more generate a series of datasets!)
SetGoniometer(ws, Axis0="5,0,1,0,-1")
# predict peaks
peaks = PredictPeaks(InputWorkspace=ws, WavelengthMin=0.8, WavelengthMax=10, MinDSpacing=0.8, MaxDSpacing=15,
ReflectionCondition='Primitive', OutputWorkspace='peaks')
# loop over predicted peaks and simulate data
# calculate structure factor
array_converter = InstrumentArrayConverter(ws)
minI, maxI = 5,50
nfwhm=6
sig_rows = 1
sig_cols = 2
np.random.seed(1)
# each row in labels
peak_labels = np.zeros((peaks.getNumberPeaks(), 4)) # each row contains [ispec, TOF index of peak cen, TOF index of peak min, TOF index of peak max]
frac_cutoff = 0.1 # cutoff in integrated intensity used to find peak extents for labelling
for ipk, pk in enumerate(peaks):
print("ipk = ", ipk)
# get data array in window around peak region
peak_data = array_converter.get_peak_data(pk, pk.getDetectorID(), peaks.column("BankName")[ipk],
15, 15, nrows_edge=1, ncols_edge=1)
ispecs = np.array(ws.getIndicesFromDetectorIDs([int(d) for d in peak_data.detids.flatten()])).reshape(peak_data.detids.shape)
# get arrays representing distance of each pixel from peak position on the detector/instrument view
rows, cols = np.meshgrid(np.arange(ispecs.shape[1])-peak_data.icol, np.arange(ispecs.shape[0])-peak_data.irow) # cols, rows
gaussian = (1/(2*np.pi*sig_rows*sig_cols))*np.exp(-0.5*(rows/sig_rows)**2 -0.5*(cols/sig_cols)**2 )
# estimate parameters
func = FunctionFactory.Instance().createPeakFunction("BackToBackExponential")
func.setParameter("X0", pk.getDSpacing()) # set centre
func.setMatrixWorkspace(ws, int(ispecs[peak_data.irow, peak_data.icol]), 0.0, 0.0)
func_callable = FunctionWrapper(func)
# loop over all spectra and simulate peak in given d-spacing range
fwhm = func.fwhm()
dmin = pk.getDSpacing() - nfwhm*fwhm
dmax = pk.getDSpacing() + nfwhm*fwhm
intensity = minI + np.random.rand()*(maxI-minI)
for irow in range(ispecs.shape[0]):
for icol in range(ispecs.shape[1]):
ispec = int(ispecs[irow, icol])
# get d-spacing range to evaluate peak
istart = ws.yIndexOfX(dmin, ispec) if dmin > ws.readX(ispec)[0] else 0
iend = ws.yIndexOfX(dmax, ispec) if dmax < ws.readX(ispec)[0] else ws.blocksize()
# get d-spacing values
dspac = ws.readX(ispec)
dspac = 0.5*(dspac[istart:iend] + dspac[istart+1:iend+1]) # convert to bin centers from edgess
# simulate data and add to workspace
func_callable.setParameter('I', gaussian[irow,icol]*intensity)
ypk = func_callable(dspac)
if irow == peak_data.irow and icol == peak_data.icol:
# spectrum corresponding to predicted peak position
# find peak extents for labeling - use cutoff in integrated intensity
icen = ws.yIndexOfX(pk.getDSpacing(), ispec)
y_cdf = np.cumsum(ypk) # cumulative distribtuion function
y_cdf /= y_cdf[-1]
ilo = np.clip(istart + np.argmin(abs(y_cdf - frac_cutoff)), a_min=0, a_max=icen)
ihi = np.clip(istart + np.argmin(abs(y_cdf - (1-frac_cutoff))), a_min=icen, a_max=ws.blocksize())
peak_labels[ipk] = [ispec, icen, ilo, ihi]
ws.dataY(ispec)[istart:iend] += np.random.poisson(lam=ypk)
# convert back to TOF
ConvertUnits(InputWorkspace=ws, OutputWorkspace=ws.name(), Target='TOF')
# save labels to csv
np.savetxt(r"/mnt/babylon/Public/RWaite/labels.txt", peak_labels, fmt='%d', delimiter=",")
iend = ws.yIndexOfX(dmax, ispec) if dmax < ws.readX(ispec)[0] else ws.blocksize()
Thank you very much for this code Richard!
Having debugged it, I guess this line
iend = ws.yIndexOfX(dmax, ispec) if dmax < ws.readX(ispec)[0] else ws.blocksize()
should be corrected as below with [-1] index if I am not mistaken?
iend = ws.yIndexOfX(dmax, ispec) if dmax < ws.readX(ispec)[-1] else ws.blocksize()
Yep indeed, sorry for typo was in a rush!
This was very successful, great work! I'm closing this issue and have put a new issue in for the next steps #37527
Describe the outcome that is desired. See if CNN can be trained on (fairly realistic) simulated WISH data and then applied to real data.
Does the outcome relate directly to a problem? Please describe. Want robust method to find single-crystal peaks, but unrealistic to be able to get ~1000s of labelled datasets.
Describe any solutions you are considering Simulate data with known UB so we know exactly which peaks are present and where.
Here is some code I have knocked up to do this (using a WISH dataset of a powder sample, that according to the title was taken with same resolution settings as typical single-crystal measurement) - no warranty included!
Peaks are simulated using a
BackToBackExponential
with realistic parameters (that do depend on d-spacing and scattering angle) with a Gaussian distribution on the detector (currently same width and height for all peaks). Note the peak is simulated in d-spacing and then transformed to TOF - so this will account for peak centre being different in adjacent pixels due to changing scattering angle and L2. The integrated intensity of the peak is randomly generated between two limits. Poisson noise is applied to the simulated data before adding to the workspace.This can be made more realistic if:
But lets give it a go as is, and see how it performs for real data!
Note additional data sets can be simulated by:
SetGoniometer
)ReflectionCondition
) inPredictPeaks
(see algorithm documentation)Once the data has been simulated the workspace can be converted to a numpy array for input to various machine learning codes using this code:
https://github.com/mantidproject/scriptrepository/blob/1712adf4119800aaf771ccd9ca1385c443355f6f/diffraction/WISH/bragg-detect/process_input_and_output_WISH_ML_peak_finding.py
The above file also has code to convert from indexing the array to spectrum index (note in WISH there are 4 detectors for every spectrum - so you should use the spectrum index instead of detector ID when comparing peaks). This will be required when training the CNN to check whether the found peaks exist of not.
Additional context Details of UB matrices and indexing can be found here https://docs.mantidproject.org/v6.1.0/concepts/Lattice.html