whitews / FlowUtils

FlowUtils is a Python package containing various utility functions related to flow cytometry analysis, primarily focused on compensation and transformation tasks commonly used within the flow community.
https://flowutils.readthedocs.io
BSD 3-Clause "New" or "Revised" License
14 stars 8 forks source link

Logicle transform and comparing with mpmath #13

Open krcurtis opened 3 weeks ago

krcurtis commented 3 weeks ago

Hi,

I've encountered some differences when calculating the logicle transform using different methods. I had seen that flowutils matches the GatingML 2.0 spec, but I get differences when computing with mpmath, a Python package for arbitrary-precision floating-point arithmetic.

When I tell mpmath to use 30 digits of decimal precision, I get these results for the test values for logicle(x,1000,1,4,1) in the GatingML 2.0 spec:

x expected computed with mpmath
-10 0.254059 0.244811788147158329560715914228
-5 0.318389 0.31741253160831268687487017787
-1 0.383001 0.382999362288320181332800213487
0 0.4 0.400000000000000022204460492503
0.3 0.405107 0.405107011048485188577537974655
1 0.416999 0.41699870734437975458826375087
3 0.450318 0.450317517410592531905350830829
10 0.545941 0.545940641054175536655407791803
100 0.791638 0.791638120758736567347236966264
1000 1 1.0

The differences seem to only occur at the negative x values, where only the first two digits after the decimal point are matching. I have also tried this with larger values of decimal precision up to 10,000 but have not seen the mpmath results match the GatingML 2.0 spec. Have you encounter this? I've pasted my mpmath python script below. Any thoughts?

Thanks!

import mpmath as mp

################################################################################                                                                                                                                                                                              

# for logicle(x,1000, 1, 4, 1)                                                                                                                                                                                                                                                
GATING_ML_TABLE = [
    (-10, 0.254059),
    (-5, 0.318389),
    (-1, 0.383001),
    (0, 0.4),
    (0.3, 0.405107),
    (1, 0.416999),
    (3, 0.450318),
    (10, 0.545941),
    (100, 0.791638),
    (1000, 1),
]

################################################################################                                                                                                                                                                                              

def find_d(w, b):
    f = lambda x : 2 * (mp.log(x) - mp.log(b)) + w *(x+b)
    result = mp.findroot(f,0.5)
    return result

def B(y, T, W, M, A):
    w = W / (M + A)
    x2 = A / (M+A)
    x1 = x2 + w
    x0 = x2 + mp.mpf(2.0)*w
    b = (M+A) * mp.log(mp.mpf(10.0))

    # d such that 2 * (log(d) - log(b)) + w*(d+b) = 0  ## need to solve with newton's method or similar                                                                                                       
    d = find_d(w, b)

    ca = mp.exp(x0 * (b+d))
    fa = mp.exp(b*x1) - ca/ mp.exp(d*x1)

    a = T / (mp.exp(b) - fa - ca/mp.exp(d))
    c = ca*a
    f = fa*a

    return a * mp.exp(b*y) - c* mp.exp(-d * y) - f

def logicle(x, T, W, M, A):
    f = lambda xx : B(xx, T, W, M, A) - x
    #result = mp.findroot(f, 0.5)                                                                                                                                                                                                                                             
    result = mp.findroot(f, 0.9)
    return result

mp.mp.dps = 30
mp.pretty = True

for x, y in GATING_ML_TABLE:
    print("For {x}, expected {y}, got ".format(x=x, y=y), logicle(x, 1000, 1, 4, 1))
whitews commented 3 weeks ago

Hi @krcurtis,

This looks to be related to floating point precision and I have seen similar differences using numpy 32-bit vs 64-bit representations, as well as differences across CPU architectures. There's a lot written on the subject, and you can find a Python specific discussion here. There are many mathematical steps in the logicle routine where these can compound. That would be my best guess for what you are seeing there.

Regards, Scott