samplchallenges / SAMPL6

Challenge inputs, details, and results for the SAMPL6 series of challenges
https://samplchallenges.github.io
MIT License
52 stars 32 forks source link

Addition to pka type III analysis and experimental data #53

Closed MehtapIsik closed 6 years ago

MehtapIsik commented 6 years ago

This PR includes:

  1. Absolute error vs pKa ID plots for each method for type III analysis.
  2. Extra pKa experiments to compare water and cosolvent based pKa measurement methods. Link to summary table.
  3. LC-MS purity results for solid compounds stocks.
bannanc commented 6 years ago

I'll have time tomorrow to look more closely at this, but I have one quick question on this:

Extra pKa experiments to compare water and cosolvent based pKa measurement methods. Link to summary table.

I assume these are additional experimental values, how are you planning to use these for analysis? Do we need to make an announcement to participants about these being available?

MehtapIsik commented 6 years ago

I plan not to use these additional experiments for the analysis. These extra experiments were done with cosolvent extrapolation method for molecules that already have pKa measurements done in 100% water. I am only reporting these extra experiments to assuage the doubt about the reliability of cosolvent methods.

In the analysis, I think we should still prefer 100% water medium pKa measurements when available, although there isn't a big difference.

bannanc commented 6 years ago

In the analysis, I think we should still prefer 100% water medium pKa measurements when available, although there isn't a big difference.

I agree, I was just worried that you would be doing something different for the overall analysis than participants are doing for their papers.

These extra experiments were done with cosolvent extrapolation method for molecules that already have pKa measurements done in 100% water.

I like this as a way to show that the cosolvent shouldn't have significant affects on the measured pKa.

MehtapIsik commented 6 years ago

At the moment calculation of SEM for the experimental data is NOT different for the measurements done in water or cosolvent. Both methods have very high reproducibility so calculated SEM values are very low within each method.

Largest deviation between water and cosolvent experiments I have observed was 0.4 units. Would it be better to adopt 0.4 as the uncertainty of cosolvent experiments in the analysis?

bannanc commented 6 years ago

Thanks @MehtapIsik

In general I think this PR is ready to go. Have you talked to Bas about how he calculated the experimental uncertainties in SAMPL5? There are more rigorous options than just taking the SEM. I worked in an analytical lab before graduate school and for all of our instrumentation we knew the limit in the measurement and then you can propagate that through in an average calculation of multiple measurements. I don't know that it would change the water measurements, but I think we should take into account that you're experimentally less certain with cosolvent. It would be good to get @jchodera and @davidlmobley to weigh in on the experimental uncertainties.

bannanc commented 6 years ago

@MehtapIsik I'm still confused about the bias with the cosolvent. If its not in the same direction can you actually call it a bias or is it just an error or fluctuation?

MehtapIsik commented 6 years ago

It is more like a random error. I think bias was the wrong word to use. I was checking for bias but I haven't observed that. image

I updated that comment about method comparison as follows: " The absolute difference between water and cosolvent pKa values were observed to be up to 0.4 pKa units, although on average the difference is 0.05 and without any obvious bias. "

Is it more clear this way?

jchodera commented 6 years ago

If I recall correctly, we were going to look into whether the Sirius T3 software automatically propagates the uncertainty in each measurement through to the uncertainty in the linear extrapolation to 0% cosolvent.

The largest discrepancy, 0.4, is not the best value to use for random error. Ideally, we would include both the information from multiple replicates and the multiple titrations from each replicate, as well as contributions to experimental uncertainties.

@bannanc is right that it would be best if our uncertainties could be consistent with the ones we are providing the various groups to use in their analysis, but there is an advantage to spending some time to really understand what the best error model is going forward.

@MehtapIsik : Perhaps we can chat on Monday to discuss the error model in more detail?

bannanc commented 6 years ago

Thanks @MehtapIsik that figure helps a lot and the rewording is definitely more clear!

I think this PR is ready to go, just that the discussion around uncertainties needed others input. I don't have access to merge the PR or I would.

davidlmobley commented 6 years ago

@jchodera @MehtapIsik - should I go ahead and merge? I agree that the error analysis/uncertainties should be clearer, but this could be addressed in a separate PR depending on what you decide.

I am not totally clear on what the plot in this comment https://github.com/MobleyLab/SAMPL6/pull/53#issuecomment-380988665 is supposed to show, though it also looks odd/interesting to me, looks like most of the values are positive but only a few are negative and whenever they are negative the values tend to be large, which would suggest the distribution these are drawn from is rather asymmetric or perhaps even bimodal?

MehtapIsik commented 6 years ago

I agree that we can merge this PR as is.

We probably can't draw strong conclusions about the error in cosolvent pKa measurements relative to water pKa measurements because I only have data of 13 molecules with both methods.