Closed ConnorPigg closed 5 years ago
Attempting to address robustness by manually changing a model's residual function to penalize probability totals being different from 1. This will be implemented by removing bounds on struct0_prob_c
and adding diff *= np.exp(10*abs(1 - ptotal))
to the existing residual function.
If changing the residual function alone does not solve the robustness issue then fitting can be performed repeatedly by adjusting the initial parameters in between fits. The parameters can be updated using something like the following.
upper_bound = 1
for prob in probability_params:
new_val = random.uniform(0, upper_bound)
upper_bound -= new_val
prob.set(value=new_val)
The Pearson correlation coefficient between probabilities and Rg VMD filled
using both sans
and saxs
data was very low (0.0637 using the vacant experimental fit and similar value for the structured and linear fits) indicating the fitting is not simply choosing the largest Rg value.
ummm How about only the X-ray data? It has much better resolution at low-Q which is the area important for Rg determination...
For X-ray only, it was again very low (0.06367) and for sans only it was (0.09762) again using Rg VMD filled
from the spreadsheet. This would seem to indicate that the differences in theoretical Rg does not strongly determine the fit.
It may be interesting to use the integer ranking instead to perform the calculation. This would provide more spread to the probability data as opposed to having many clumped near 0 and a few outliers that are orders of magnitude larger. Edit 1: Using the integer ranking brings the correlation up to about (0.23). Still small but identifies a larger impact than the probabilities would suggest. Edit 2: Using integer ranking for both Rg and integer ranking for probabilities brings the correlation up to (0.34). Once again, this is small but compares the orderings which is what was initially desired.
"integer ranking" ?
Re-scaling was tested on a single leaf node. The test involved:
The scaling was done by using a factor scale = max(leaf.y)/max(exp.y)
.
# The experiment was rescaled by
exp.y *= scale
exp.e *= scale
# while the model was rescaled by
leaf.y /= scale
The fits were then performed to find the slope arguments. The slope from scaling the experiment was divided by the scaling factor and used as the slope in the first's eval function. Similarly, the slope from scaling the model was multiplied by the scaling factor and used in the first's eval function. These slopes were all close in value resulting in similar residuals. The reduced chisqr for (2) and (3) were the same to several digits and ended up being smaller than the reduced chisqr found without scaling. Scaling the experimental data is simpler and is the method chosen for future fitting methods. The scaling factor was then generalized to account for more than one structure using the following scale = sum(max(struct.y) for struct in structs_to_be_fit)/(len(structs_to_be_fit) * max(exp.y))
or the average maximum structure value divided by the maximum experimental value.
Fitting the data associated with the ribosome has been challenging. This issue will collect the discussion, problems, tasks, and solutions. This will simplify where to find the information. Similarly, it will document the information and updates as an example of working with idpflex.
TODO