Open cfcohen opened 3 years ago
Really great work!! Althought python2 is discontinued, I would not like to break python2 compatibility because many system still use as default. I'll try to filter the pull-request to integrate the code cleaning section.
Sounds great. To be honest I didn't fully realize that the static typing annotations was a Python 3 only thing because I've been exclusively using Python3 for so long now. Take whatever is beneficial for you and leave the rest. This was really just so that I could better understand your code. I suggest looking at commits 38ae29f and fbfd2fe a little more closely, because I think they might reflect things that were truly wrong/suspicious in the original code. The other 99% of the code all checked out as correct once I understood what was happening.
I just cherry-pick the style commit. I need more time to give a try to your changes, thanks, really great work
wow, really valuable speed increase with your code, so i work with that to remove static typing and add to module, thanks
Huh. I hadn't actually noticed. I'm really glad to see that you're finding value in the changes. They were fairly arbitrary, and non-trivial in scope. It seems to me that the performance improvement would have to be because of one of these two primary changes:
Numpy is really fast when the arrays are large. I've always suspected that it was mildly harmful when there was a single float inside an ndarray. In particular, I find the subtly different semantics between native float division and numpy float division very confusing. That's why I was on a little bit of a war path to exterminate the ndarray types from the places that were documented as being native floats.
I think the dictionary thing is less likely but there is a fair bit of string comparison that's gone now. I suppose that could be it. Are you sure it's not just a Python2 versus Python3 thing?
Since I probably know my changes better, here are the commits I think of as being primarily related to the numpy elimination:
https://github.com/jjgomera/iapws/pull/53/commits/38d0a95dd12e3fb6924a79b0114dc7f7238749d3 Convert to math.exp() and math.log() where possible. https://github.com/jjgomera/iapws/pull/53/commits/fa1eb5d9208682390b2fad15da1d3c42fbe6ae64 Isolate numpy.log() and numpy.exp() calls as much as possible. https://github.com/jjgomera/iapws/pull/53/commits/aaeca01a8de98747759070b37d51d0e2788c3511 A working version with no numpy log()/exp() dependency. https://github.com/jjgomera/iapws/pull/53/commits/f9abf87a9a3cddc4368cd5344cebe932a3460aa5 More comprehensively cast from numpy.Float64 to Python float. https://github.com/jjgomera/iapws/pull/53/commits/b05ee95e50d36b6900f9455093d51c099460edd1 This commit demonstrates that numpy.Float64 is mostly gone.
I'm far from certain that's the cause of the performance improvement but it's my best theory. I never would have been able to figure this out without static typing by the way... Which is a big part of why I wanted static typing in the first place. I suppose you could try to check results with assert(type(x)==float) used in many places.
Another possibility might be to accept the changes more completely, and then remove the static typing with a search-and-replace regexp... :.*[=,] I could do this if you'd like.
One comment I'll add is while this does appear to be faster for single P-T calculations, the necessity of creating a new object at every P-T point creates enormous overhead when trying to use this in a vectorized format. For example, my use case has a 400x5000x400 grid of P-T values which need to be solved. Creating 800,000,000 objects that need to be garbage collected is not ideal from a performance perspective.
@cfcohen, could you rebase this PR? @jjgomera, does python2 support argument still stand? (seems so based on recent commit in master branch)
This was never intended to be a pull request that I actually expected you to merge, but rather something I felt that I needed to do for my own benefit in order to understand this code. ;-) That said, I see no reason why I shouldn't let you know about my changes. You are welcome to accept as many, or as few, or even zero commits as you see fit.
The subject matter doesn't make for the simplest code in the world to begin with. I found several of the code paradigms such as the passing of dictionaries, which creates ambiguity about defined keys, and the heavy use of aliasing of constants in the module, class, and function scopes very confusing. The 1000+ line calculo() in iapws95.py was particularly perplexing to me, with even minor changes to the code demonstrating that I had no real comprehension of which variables were defined where.
Since the static typing features I've used necessitate Python3, and I've made no real efforts to understand the consequences for early Python3 support either, I suspect that your primary complaint about these changes will be that they substantially reduce the supported Python versions. That wasn't really a concern for my purposes.
While I made some effort to not modify the external API, I did not slavishly adhere to that as a requirement. There's a reasonable chance that the external API shifted minorly around some corner cases, and certainly did in a number of internal functions as identified by leading underscores. In at least one place, I think I now return 'nan' instead of None in the external API (which wasn't test for or documented anyway).
The benefits of the branch I hope will be obvious. It passes static type checking with mypy, and is clean of errors for a reasonable flake8 configuration. At least for me (which I realize can be very subjective) this branch is much more easily understood, and provides greater assurances about code correctness. It passes your unit tests with fairly minimal changes to the API and no changes to results, which I leaned on heavily to check correctness while changing the code.
I'm probably done working on this for a little while, but I intend to return to it again in the future with more emphasis on my understanding the IAPWS standard now that I understand the code a little better. If you're interested in doing something with this pull request let me know and I'll see if I can help. If not, that's fine too.