Closed qzhu2017 closed 4 years ago
@sayred1 For the first step, I suggest you do the following
I suppose we should get something close to 1 for each values. Let's see.
@sayred1
http://rruff.geo.arizona.edu/AMS/amcsd.php
We will worry about the profiling function later.
@qzhu I've finished calculating the similarity over the validation set. In total there are 288/370 structures (not the full set, either because of partial occupation or because of the website issue I ran into).
Here is the histogram of similarities for all structures .
I've also shuffled through the structures, and classified them into their 7 geometric classes. I did this based on their definitions of cell parameters (a, b, c, alpha, beta, gamma). None of the structures from the site were monoclinic, and I've also grouped the trigonal and hexagonal structures together.
At first glance these plots look promising since the distribution of similarities for each class tends to 1.0. I can look further into this if need be.
The above was with FWHM = 0.9. When we decrease this, say to 0.2, this is what the distributions look like:
Shortly I'll upload some of my scripts. For now I'll begin looking into that other profiling function, until you open the new issue we discussed during our meeting yesterday.
@sayred1 There is a serious bug in your similarity script. I tried to analyze the 1st structure which returned the similarity value of 0.19
If you look them by eye, this looks very similar, however, it returned 0.19!!! Then I tried to plot the following
Clearly, there is something wrong with g2_thetas
You similarity code assumes that the g2thetas will be the same. But it is not in this case.
I hope next time you can find such bugs by yourself.
Okay, so I've preprocessed one of the compared peaks before calculating the similarity:
This first structure now gives S = 0.9914625113433583 after fixing the bug. Running the script through the whole dataset gives the following histograms:
The orthorhombic peak that returns S < 0.5 looks like this:
Apparently pxrd didn't do a very good job for this one structure. I'll explore this further.
@sayred1 Did you update the code?
I just checked this structure Acmm.cif
. It is an unusual structure. You can ignore it.
Sorry, I forget to press send on this comment.
Yes, all the scripts are now uploaded.
@sayred1 Why not put the interpolate function to similarity class?
@qzhu2017 this has been updated
@sayred1 I am not happy with the code similarity code. As I said, the code should be as general as possible. The current code for preprocess looks very confusing to me. You need to consider this is a general utility compute the similarity between two spectra. The two spectra may have different sizes and different range of x_values. Then you need to determine
I am attaching a prototypical code below, please do the new code based on this version.
import numpy as np
import scipy.integrate as integrate
from scipy import interpolate
class Similarity(object):
"""
Class to compute the similarity between two diffraction patterns
"""
def __init__(self, f, g, x_range=None, N=None, weight={'function': 'triangle', 'params': 0.6}):
"""
Args:
f: spectra1 (2D array)
g: spectra2 (2D array)
x_range: the range of x values used to compute similarity ([x_min, x_max])
N: number of sampling points for the processed spectra
weight: weight function used to compute the similarity (dictionary)
"""
self.fx, self.fy = f
self.gx, self.gy = g
self.N = N
self.x_range = x_range
self.weight = weight
#self.scaling = scaling
# QZ: I don't know why you make the following statement
# It is not really necessary that these two must have the same dimension.
#try:
# self.fpeaks.shape[0] == self.gpeaks.shape[0]
#except:
# print("Patterns are not the same shape")
#
# self.N = fpeaks.shape[0]
# self.r = np.linspace(-1, 1, self.N)
self.preprocess()
function = self.weight['function']
if function == 'triangle':
self.triangleFunction()
else:
msg = function + 'is not supported'
raise NotImplementedError(msg)
def calculate(self):
"""
Compute the similarity
"""
fpeaks_r = self.fpeaks
f2thetas_r = self.f2thetas + self.r
gpeaks_r = self.gpeaks
g2thetas_r = self.g2thetas + self.r
x_fg = np.concatenate((self.f2thetas, g2thetas_r))
x_fg = np.linspace(x_fg[0],x_fg[-1],self.N)
x_ff = np.concatenate((self.f2thetas, f2thetas_r))
x_ff = np.linspace(x_ff[0],x_ff[-1],self.N)
x_gg = np.concatenate((self.g2thetas, g2thetas_r))
x_gg = np.linspace(x_gg[0],x_gg[-1],self.N)
xCorrfg = integrate.trapz(self.fpeaks*gpeaks_r,x_fg)
aCorrff = integrate.trapz(self.fpeaks*fpeaks_r,x_ff)
aCorrgg = integrate.trapz(self.gpeaks*gpeaks_r,x_gg)
xCorrfg_w = integrate.trapz(self.w*xCorrfg, self.r)
aCorrff_w = integrate.trapz(self.w*aCorrff, self.r)
aCorrgg_w = integrate.trapz(self.w*aCorrgg, self.r)
return xCorrfg_w / np.sqrt(aCorrff_w * aCorrgg_w)
def preprocess(self):
"""
preprocess the input f ang g spectrum
we do two steps
1, determine the range of x values (if not specified, use the overlap between f_x and g_x)
2, transform f/g to f_new/g_new with the same x values
"""
inter = interpolate.interp1d(self.f2thetas,self.fpeaks,'cubic',fill_value="extrapolate")
self.f2thetas = np.linspace(np.min(self.g2thetas),np.max(self.g2thetas),self.N)
self.fpeaks = inter(self.f2thetas)
def triangleFunction(self):
w = np.zeros((self.N))
l = self.weight['params']
for i in range(self.r.shape[0]):
r = np.abs(self.r[i])
if r < 1:
tf = lambda r,l : 1 - r/l
w[i] = tf(r,l)
else:
w[i] = 0
self.w = w
@qzhu2017 thanks for the insight, I'll make these corrections.
I noticed that you commented out self.r = np.linspace(-1, 1, self.N)
in def __init__()
. Was this an accident?
@sayred1
Maybe there should be a parameter called r_range
with a default value of [-1, 1]. This way, user can conveniently modify the r values.
@qzhu2017 I've generalized similarity.py
, let me know if there is something that I left out.
I've also changed the member variable of XRD.get_profile()
. Instead of storing self.gpeaks =gpeaks, self.thetas2 = thetas2
, it now stores self.pattern = np.vstack((gpeaks,thetas2))
.
Excellent. I think it looks good now.
@sayred1
After you implemented the similarity function, I suggest you collect a database for systematic validation. Similar to what you have done for NaCl, select each representative structure from space group 1-230. Collect the experimental PXRD and compute the PXRD from Mercury and our own code. And then compute the similarity.
I will tell you more details on Friday.