issue with trying to set parameter b to 0 in standardize(constraints = ...) call

japhir commented 2 years ago

In an email (because the data are not published yet) you mentioned

If you want to force your "compositional non-linearity slopes" (parameter b in Daëron, 2021) to be zero, you can use the constraints argument of D47data.standardize(). I realize now that this is under-documented, but the way it works, if you want to set b=0 for sessions foo, bar and baz, is to use constraints = dict(b_foo = 0, b_bar = 0, b_baz = 0). I haven't extensively tested this option yet, however, so things may break somewhere else as a result. As always, please open an issue if that is the case.

I've tried this out but I couldn't get it to work. My sessions were named '2018-02-23', '2020-01-03', and '2021-05-12' and when I tried:

  mydata.standardize(
      constraints = dict(b_2018-02-23 = 0,
                         b_2020-01-03 = 0,
                         b_2021-05-12 = 0))

I got the following errors

Traceback (most recent call last):
  File "<string>", line 8, in __PYTHON_EL_eval
  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<string>", line 1
    dict(b_2018-02-23 = 0,
                ^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

I then thought that they should be passed as characters, so I did it with bracketing ''s. ('b_2018-02-23') but this gave me the next error:

Traceback (most recent call last):
  File "<string>", line 8, in __PYTHON_EL_eval
  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<string>", line 1
    dict('b_2018-02-23' = 0,
         ^^^^^^^^^^^^^^^^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

which made me think that the issue may be with the names of the sessions, because typically - is not allowed in a variable name (right?). So I renamed the sessions to session0, session1, and session2 (see, I'm getting the hang of Python's 0-indexing even though I'm an R user ;-))

Traceback (most recent call last):
  File "<string>", line 17, in __PYTHON_EL_eval
  File "<string>", line 3, in <module>
  File "/tmp/babel-qGlEVX/python-7afIAv", line 1, in <module>
    mydata.standardize(
  File "/home/japhir/SurfDrive/PhD/programming/D47crunch/D47crunch/__init__.py", line 1006, in newfun
    out = oldfun(*args, **kwargs)
  File "/home/japhir/SurfDrive/PhD/programming/D47crunch/D47crunch/__init__.py", line 1628, in standardize
    params[k].expr = constraints[k]
  File "/usr/lib/python3.10/site-packages/lmfit/parameter.py", line 845, in expr
    self.__set_expression(val)
  File "/usr/lib/python3.10/site-packages/lmfit/parameter.py", line 860, in __set_expression
    self._expr_ast = self._expr_eval.parse(val)
  File "/usr/lib/python3.10/site-packages/asteval/asteval.py", line 257, in parse
    if len(text) > self.max_statement_length:
TypeError: object of type 'int' has no len()

Note that calling mydata.standardize() without any constraints does work as expected (and doesn't result in very different results after limiting the anchors to ETH-1, ETH-2, and ETH-3).

Any ideas?

mdaeron commented 2 years ago

Can you try constraints = {f'b_{D47crunch.pf("2018-02-23")}': 0}?

japhir commented 2 years ago

mydata.standardize(constraints = {f'b_{mydata.pf("session0")}' : 0})

AttributeError: 'D47data' object has no attribute 'pf'. Did you mean: 'Nf'?

mdaeron commented 2 years ago

pf is not a class method of D47data (so mydata.pf() will fail) but a function defined at the package level. So from D47crunch import pf; pf('foo'), from D47crunch import *; pf('foo') and D47crunch.pf('foo') will all work the same.

mdaeron commented 2 years ago

I tried this:

from lmfit import report_fit
from D47crunch import *

mydata = D47data(virtual_data(
    session = 'mysession',
    samples = [
        dict(Sample = 'ETH-1', N = 4),
        dict(Sample = 'ETH-2', N = 4),
        dict(Sample = 'ETH-3', N = 4),
        dict(Sample = 'FOO', N = 4, D47 = 0.6, D48 = 0.1, d13C_VPDB = -4.0, d18O_VPDB = -12.0),
    ]), verbose = True)

mydata.refresh()
mydata.wg()
mydata.crunch()
report_fit(mydata.standardize(
    constraints = {'b_mysession': '0'}
    ))
mydata.table_of_sessions()

And got the same error as you in D4xdata.consolidate_sessions():

Traceback (most recent call last):
  File "/Users/daeron/temp/D47crunch_constraints/./foo.py", line 18, in <module>
    report_fit(mydata.standardize(
  File "/opt/homebrew/lib/python3.9/site-packages/D47crunch/__init__.py", line 1006, in newfun
    out = oldfun(*args, **kwargs)
  File "/opt/homebrew/lib/python3.9/site-packages/D47crunch/__init__.py", line 1689, in standardize
    self.consolidate(tables = consolidate_tables, plots = consolidate_plots)
  File "/opt/homebrew/lib/python3.9/site-packages/D47crunch/__init__.py", line 1006, in newfun
    out = oldfun(*args, **kwargs)
  File "/opt/homebrew/lib/python3.9/site-packages/D47crunch/__init__.py", line 2297, in consolidate
    self.consolidate_sessions()
  File "/opt/homebrew/lib/python3.9/site-packages/D47crunch/__init__.py", line 2200, in consolidate_sessions
    i = self.standardization.var_names.index(f'b_{pf(session)}')
ValueError: 'b_mysession' is not in list

That is definitely a "book keeping" bug: the standardization performs as expected but the reporting/book keeping functions after that were never modified to account for potential missing parameters (because of how lmfit works). In the past I had only ever used constraints to force D47 differences between samples (e.g., between heated and 25C equilibrated CO2). My bad.

I know what to do about this (add the missing parameters explicitly after the standardization regression is done). Thanks to you I have an incentive to do so quickly. I'll report here when it is done.

mdaeron commented 2 years ago

Can you please check if the new branch cleanup_after_using_stdz_constraints produces any new errors?

I tested it with the following code:

from lmfit import report_fit
from D47crunch import *

mydata = D47data(virtual_data(
    session = 'mysession',
    samples = [
        dict(Sample = 'ETH-1', N = 4),
        dict(Sample = 'ETH-2', N = 4),
        dict(Sample = 'ETH-3', N = 4),
        dict(Sample = 'FOO', N = 4, D47 = 0.6, D48 = 0.1, d13C_VPDB = -4.0, d18O_VPDB = -12.0),
    ]), verbose = True)

mydata.refresh()
mydata.wg()
mydata.crunch()
report_fit(mydata.standardize(
    constraints = {'b_mysession': '0'}
    ))
mydata.table_of_sessions()
mydata.table_of_samples()
mydata.plot_sessions()

This seems to work as it should:

[table_of_sessions] 
–––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––  ––––––––––––––
Session    Na  Nu  d13Cwg_VPDB  d18Owg_VSMOW  r_d13C  r_d18O   r_D47         a ± SE   1e3 x b ± SE          c ± SE
–––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––  ––––––––––––––
mysession  12   4       -4.000        26.000  0.0000  0.0000  0.0160  1.014 ± 0.024  0.000 ± 0.000  -0.898 ± 0.009
–––––––––  ––  ––  –––––––––––  ––––––––––––  ––––––  ––––––  ––––––  –––––––––––––  –––––––––––––  ––––––––––––––

[table_of_samples] 
––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
Sample  N  d13C_VPDB  d18O_VSMOW     D47      SE    95% CL      SD  p_Levene
––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––
ETH-1   4       2.02       37.02  0.2052                    0.0235          
ETH-2   4     -10.17       19.88  0.2085                    0.0098          
ETH-3   4       1.71       37.45  0.6132                    0.0180          
FOO     4      -4.00       26.83  0.5934  0.0106  ± 0.0229  0.0057     0.006
––––––  –  –––––––––  ––––––––––  ––––––  ––––––  ––––––––  ––––––  ––––––––

Also, as expected, the standardization errors no longer vary with δ47:

D47_plot_mysession.pdf

japhir commented 2 years ago

Yep this works exactly as advertised for my data! Results are very similar to what I got before, however (as expected, I had very few non-ETH-1--3 and because the small δ values were already corrected for pressure-baseline effects the influence of δ47 should be very small) so I'll get back to you in the email if I have further ideas of what could be the cause of the big difference. Thanks! This resolves this issue!

mdaeron commented 2 years ago

Great. Beware that you may still get errors in some cases, e.g., I havent patched D4xdata.sample_D4x_correl() to account for zero-variance values in the fixed parameters. I'll look into it but not right away.

mdaeron / D47crunch

issue with trying to set parameter b to 0 in standardize(constraints = ...) call #13