re-mat / clowder-extractors

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Compute Values from Measured Inputs #7

Closed BenGalewsky closed 1 year ago

BenGalewsky commented 1 year ago

As an experiment publisher I want to provide measured values (mass or volume of compounds used) and have Clowder convert to conventional concentrations, ratios, and mol% values so others can use my data in their analysis

Description:

The data entry spreadsheet has collects data for the various components of an experiment:

  1. Monomers,
  2. Catalysts
  3. Inhibitors
  4. Additives
  5. Solvents

Each of these components need to record the exact amounts used in the experiment. Some compounds are expressed in mass (in grams) or volume (in milliliters). In the spreadsheet there will be fields for mass and volume. The user will fill in one value and leave the other blank. There are some values where only mass or volume make sense. Those tabs in the spreadsheet will only have one column.

For analysis, however people will need these values to be converted to measures of concentration, ratios between compounds, and mole %.

Chemistry Data

The conversions require precise measures of the compound's molecular weight and density. We will be using a web service called ChemSpider to retrieve these values from a given SMILES string which will be included in the spreadsheet for the compound.

Obtaining Data from ChemSpider

We will use the python bindings for the ChemSpider API, chemspipy. It requires an access token which will be obtained from an environment variable.

Density

In some cases, the service doesn't provide a single value for density of the compound. Instead it lists several experimentally derived values. See this example.

For these cases we will want to compute the average of the provided densities and round to the hundredths place.

Calculations

VARIABLES FROM THE SPREADSHEET:

PROPERTIES FROM CHEMSPIDER:

FORMULAE:

  1. Monomer mol% X_i = (m_i/M_i)/sum(m_i/M_i) (if mass is input) m_i = V_i*rho_i (if volume is input, first convert to mass)
  2. Volume of monomers V_mon = sum(m_i/rho_i)
  3. Avg. MW of all monomers = sum(X_i*M_i)
  4. Monomer:Catalyst molar ratio = (sum(m_i)/M_avg)/(m_c/M_c)
  5. Inhibitor:Catalyst molar ratio = ((V_inh*rho_inh)/M_inh)/(m_c/M_c)
  6. wt% of fillers = wt% = m_fi/(sum(m_i) + sum(m_fi) + m_c + (V_inhrho_inh) + (V_srho_s)) (if mass is input) m_fi = V_fi*rho_fi (if volume is input, first convert to mass)
  7. Total filler volume V_f = sum(V_fi) or sum(m_fi/rho_fi)
  8. Solvent concentration = V_s/m_c
  9. Total volume V = V_mon+V_inh+V_s+V_f The formulae in bold are the 5 composition parameters that will be reported in the database. The rest are derived quantities that are useful for computation

Tasks

pranavk239 commented 1 year ago

@BenGalewsky An additional functionality that I realise would make this process more user friendly, is to allow reporting of mass and volume in milligrams and microliters (what is often used in the lab); and if detected, make the conversion of x mg = x/1000 g, and y uL = y/1000 mL for ease of calculation (since density and Mwt values are in g/mL)

pranavk239 commented 1 year ago

Other edits:

@stayal2 can contact me for example input data and expected output

BenGalewsky commented 1 year ago

@BenGalewsky An additional functionality that I realise would make this process more user friendly, is to allow reporting of mass and volume in milligrams and microliters (what is often used in the lab); and if detected, make the conversion of x mg = x/1000 g, and y uL = y/1000 mL for ease of calculation (since density and Mwt values are in g/mL)

How would we know? Should we just change the spreadsheet to specify those units and researchers would have to do conversion if they are using anything other that mg and uL?

pranavk239 commented 1 year ago

@BenGalewsky An additional functionality that I realise would make this process more user friendly, is to allow reporting of mass and volume in milligrams and microliters (what is often used in the lab); and if detected, make the conversion of x mg = x/1000 g, and y uL = y/1000 mL for ease of calculation (since density and Mwt values are in g/mL)

How would we know? Should we just change the spreadsheet to specify those units and researchers would have to do conversion if they are using anything other that mg and uL?

Yes I think that would make more sense than reporting g and mL (since in most cases those aren't being measured out). In that case, all mass and volume variables in the formulae need to be defined as the input/1000