pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

Load data in another datatype rather than float64 #2771

Closed akasom89 closed 6 months ago

akasom89 commented 6 months ago

Based on my experiences with Satpy, all satellite data I have loaded so far are in the float64 datatype. However, I'm considering whether it's possible to set the datatype to a different type, such as np.float16, as float64 may be unnecessary for many use cases.

I searched the documentation. Although there mentioned the same point, but I could not find how to do that. link to docs

At the moment, satellite instruments are rarely measuring in a resolution greater than what can be encoded in 16 bits. As such, to preserve processing power, please consider carefully what data type you should scale or calibrate your data to.

mraspaud commented 6 months ago

@akasom89 thanks for you question, it is indeed very relevant.

Since a few months, we have started migrating most of the satellite data from float64 to float32, since we realized, as you say, that float64 is unnecessary for this data. This results in more memory efficient computations.

However, we have not introduced the possibility to choose the type. As you note, most satellites measure the earth radiance and store the measurement in a 16 bits (unsigned) integer. In order to preserver those 16 bits when converting to floats (for calibration), we need at least a float32, with a mantissa of 22 bits, as the mantissa of the float16 in only 10 bit. So we are now taking measures to convert all satpy processing to support (and stay in) 32 bits floats.

akasom89 commented 6 months ago

Thanks @mraspaud for your detailed answer!

mraspaud commented 6 months ago

Closing this for now, feel free to reopen if more action is needed here.

akasom89 commented 5 months ago

Thanks again, @mraspaud. Since it may take some months for that change to be applied to satpy's new version, is there any guide for implementing it partially on my own? This would help me assess how much it would improve the performance of my code.

For example, if I only want to use one satellite dataset, it might make sense to apply changes to the source readers to have the data in np.float32 instead of np.float64.

mraspaud commented 5 months ago

Most readers should already be converted to float32 in the current stable version, so I'm wondering what data you are reading with satpy and version you have installed?

akasom89 commented 5 months ago

Yes. You are right. It seems that it was a problem of the outdated version on my side. Thanks.