psychrometrics / psychrolib

📚 Library of psychrometric functions to calculate 🌡️ thermodynamic properties of air for Python, C, C#, Fortran, R, JavaScript and VBA/Excel
MIT License
219 stars 59 forks source link

Add support for handling vectorised operations in Python #34

Closed dmey closed 4 years ago

dmey commented 5 years ago

At the moment we only support scalar inputs, I do not know if there is any demand to support vector operations in other languages but in Python, I think that this would be a good addition.

Having said that, for now, and as a workaround, one can use numpy.vectorize to allow for vectorised operations in PsychroLib as follows:

numpy.vectorize(psychrolib.function_name)

E.g.

import numpy as np
import psychrolib as psy
specific_humidity = [0.0001, 0.0002]
v_GetHumRatioFromSpecificHum = np.vectorize(psy.GetHumRatioFromSpecificHum)
humidity_ratio = v_GetHumRatioFromSpecificHum(specific_humidity)
print(humidity_ratio)
array([ 0.00010001,  0.00020004])
C-H-Simpson commented 4 years ago

I've looked into whether I can make this work with numba, in which case the vectorizations could be very fast if you're doing very large arrays. I can make it work, but need to remove the dependence of the functions on the global variable in isIP() to determine the unit system. The solution I see is to make two fully separate versions of the library for IP and SI, then have a factory class that only shows the user the correct library for the unit system they have selected, so that the outward appearance of the library isn't changed.

Edit: probably the fast thing to do it rather to wrap numpy.vectorize around the fortran functions, rather than the python functions...

Edit 2: From limited tests it appears it is slower to use np.vectorize wrapped around the imported psychrolib_fortran in python, than it is to wrap it around the python native version. Thinking about it, this makes sense, because in both cases the loop occurs inside numpy, but in the fortran case there is an additional wrapper layer in each calculation.

dmey commented 4 years ago

The use of numpy.vectorize is primarily for convenience, not performance (see notes at https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html). Originally I opened this issue to provide some information on how to use PsychroLib when dealing with vector inputs but I see that users may assume that the use of numpy.vectorize will somewhat improve performance.

@C-H-Simpson I think that the use of Numba in the case of PsychroLib to improve the performance of GetTDewPointFromVapPres and GetTWetBulbFromHumRatio would be preferable over having to rely on Fortran/C as the latter would considerably complicate the distribution of PsychroLib on PyPi.

With regards to the issue of accepting vector inputs, this is a separate issue to that of performance and we could achieve it by updating the function signatures to accept vectors.

I am a bit tight with a few deadlines at the moment but I'd be more than happy to help if you intend to contribute -- I may be a bit slow though... Is this something you need urgently?

In any case, I think we should do the following:

  1. Update all function signatures to accept vectors.
  2. Add Numba support for GetTDewPointFromVapPres and GetTWetBulbFromHumRatio -- with fallback in case Numba is not available on the system. I am not sure we should make Numba a hard dependency.

How does this sound?

@didierthevenard as far as I see those are the two bottle necks in the code -- do you see anything else that should be improved?

didierthevenard commented 4 years ago

Regarding vectorization: I think we should make a version of the library that accepts numpy arrays. I did not think it was necessary but I realize from our experience with R that some people will for example process an entire TMY file (8760 records) as a vector rather than using a loop. I am really not an expert at numpy so I don't know what it entails - would we need two different versions, one working with scalars and one working with arrays? Regarding the other point raised by @C-H-Simpson i.e. the dependency on IsIP(): the more I think about it and the more I like the solution that was used by the C# port, which is to use a class. It has many advantages over our current implementation, in particular when developing applications that do calculations in both SI and IP (in which case one just declare two objects, one in SI and the other in IP, rather than juggling between the two systems of units by repeatedly calling SetUnitSystem()). Could this work as a solution for the numba issue too?

C-H-Simpson commented 4 years ago

@dmey I agree that's a good approach. This is not something I need urgently. The current solution for my own work is to copy out the functions I use, and make them numba-friendly. (I was finding that bisection was a bottleneck) Would be happy to turn this into a more general solution and contribute though. As for whether or not to make numba a requirement, you're probably right to keep the dependencies to a minimum. Maybe keeping the numba implementation separate would be good.

@didierthevenard Many of the functions will already accept vector inputs with no modification, e.g. psy.GetTKelvinFromTCelsius( np.array([300, 310, 320, 330]) ) works absolutely fine. This is the case for all functions that only use arithmetic operators. Functions that use logical operators would need modification, or could just be wrapped in np.vectorize to make them work as noted by @dmey.

Regarding the class structure: having separate classes seems more logical to me.

dmey commented 4 years ago

@C-H-Simpson

Maybe keeping the numba implementation separate would be good.

I would prefer to add this to the current implementation within a fallback to pure python if Numba is not avail. -- easy to do.

Functions that use logical operators would need modification, or could just be wrapped in np.vectorize to make them work as noted by @dmey.

This will be trivial so no need to use np.vectorize.

@didierthevenard @C-H-Simpson

Regarding the class structure: having separate classes seems more logical to me.

👍

@didierthevenard @C-H-Simpson I should hopefully find some time tonight/tomorrow to start modifying the functions to accept vector inputs and will push this into a new branch. Once that's done, @C-H-Simpson would you be OK in merging in your changes for Numba and then we can add the fallback and later implement the change for handling the system of units.

C-H-Simpson commented 4 years ago

@dmey Of course, let me know when the new branch is ready.

dmey commented 4 years ago

@C-H-Simpson @didierthevenard all of the above should have been addressed by https://github.com/psychrometrics/psychrolib/commit/10315258b6b638d0a3d2378815d68d0753bd8471 with extra code and no API changes. Could you please give it a try and let me know. Thanks.

C-H-Simpson commented 4 years ago

@dmey as far as I am concerned 1031525 is good, and a bit more elegant than the code I was using. It means you cannot switch between unit systems, but that's not a problem for me, I'm only using SI. I does, however, fail tests. If the unit systems are going to be split into different classes, then this isn't a problem.

Importantly, this code is much faster than the master branch! I tested 10^6 random values through GetTWetBulbFromRelHum. Without the numba support it took 1 minute, with numba support it took 3 seconds.

Thanks!

dmey commented 4 years ago

Great! Will create a PR and merge this in. Thanks for testing and giving feedback!

dmey commented 4 years ago

@C-H-Simpson I have just released 2.5.0 on PyPI which includes this.

gsalinas commented 1 year ago

Hi! I came to open an issue to suggest this feature and I'm pleased to see it's already here, except I think it means I don't understand something. On Psychrolib 2.5.0, if I do:

import numpy as np
import psychrolib
psychrolib.SetUnitSystem(psychrolib.SI)
ts = np.ones(10)*25
rhs = np.ones(10)*0.5  
pressures = np.ones(10)*101325
hum_ratios = psychrolib.GetHumRatioFromRelHum(ts, rhs, pressures)

I would expect to get a length 10 vector of humidity ratios, but instead I get:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Is there a trick I'm missing to get vector support working?

C-H-Simpson commented 1 year ago

@gsalinas your code works fine for me. I'm guessing the problem is that you don't have numba installed, which is an optional dependency.

gsalinas commented 1 year ago

That was it! Thank you so much!