sede-open / pyELQ

The python Emission Localization and Quantification (pyELQ) code aims to maximize effective use of existing measurement data, especially from continuous monitoring solutions. The code has been developed to detect, localize, and quantify methane emissions from concentration and wind measurements.
https://sede-open.github.io/pyELQ/
Apache License 2.0
10 stars 3 forks source link

Optimize wind turbulence calculation #13

Closed TannazH closed 6 days ago

TannazH commented 2 weeks ago

Description

The motivation for the pull request is to optimize the calculation of wind turbulence in meteorology class to speed up the calculation by a factor of 100. This is very useful when meteorology data is large.

The function calculate_wind_turbulence_horizontal in the meteorology class is modified and instead of using rolling.apply(scipy.stats.circstd) for a window of choosing, the equation for calculating circstd is used enabling the use of rolling.mean which is 100 times faster. The rolling.apply is like performing a loop over the data and built in rolling methods such as rolling.mean are optimized for the speed.

Following is a piece of code that illustrate the time-efficiency of the proposed method by comparing the time it takes to calculate the horizontal wind turbulence using the circstd and optimized method.

import numpy as np
from scipy.stats import circstd 
import pandas as pd
from pyelq.meteorology import Meteorology
import timeit

n = 10000
window = "10s"
wind_direction = pd.Series(np.random.uniform(0, 360, n))
wind_direction.index = pd.date_range('2018-01-01', periods=n, freq='1s')

### Calculate the wind turbulence using circstd function
start_time_circstd = timeit.default_timer()
wind_turbulence_circstd = wind_direction.rolling(window=window , center=True, min_periods=3).apply(
    circstd, kwargs={"low": 0, "high": 360})
time_elapsed_circstd = timeit.default_timer() - start_time_circstd

### Calculate the wind turbulence using the proposed method 
start_time_optimized = timeit.default_timer()
sin_rolling = (np.sin(wind_direction * np.pi / 180)).rolling(window=window, center=True, min_periods=3).mean()
cos_rolling = (np.cos(wind_direction * np.pi / 180)).rolling(window=window, center=True, min_periods=3).mean()
wind_turbulence_optimized = np.sqrt(-2 * np.log((sin_rolling**2 + cos_rolling**2) ** 0.5)) * 180 / np.pi
time_elapsed_optimized = timeit.default_timer() - start_time_optimized

print("Time took to calculate wind turbulence using circstd function is {} seconds. ".format(time_elapsed_circstd ))
print("Time took to calculate wind turbulence using the optimized approach is {} seconds. ".format(time_elapsed_optimized ))

### Showing that calculating wind turbulence using the optimized approach is 100 faster than using circstd function. 
print("Calculating wind turbulence using the optimized approach is 100 times faster than circstd function: {}".format(time_elapsed_optimized < time_elapsed_circstd/100))

### Showing that the wind turbulence calculated from optimized approach is identical to those obtained from circstd function. 
mean_turbulence = np.mean(wind_turbulence_optimized)
mean_turbulence_circstd = np.mean(wind_turbulence_circstd)
print("Optimized approach for calculating the wind turbulence gives identical results to those from the circstd: {}".format(np.isclose(mean_turbulence, mean_turbulence_circstd)))

Running the above code gives the following results

Time took to calculate wind turbulence using circstd function is 1.6943280000014056 seconds. Time took to calculate wind turbulence using the optimized approach is 0.0017694999987725168 seconds. Calculating wind turbulence using the optimized approach is 100 times faster than circstd function: True Optimized approach for calculating the wind turbulence is identical to those from the circstd: True

Fixes # (issue) The time-inefficient calculation of the wind turbulence in case of having large meteorology data.

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

It has been tested using the above piece of code that the optimized approach for calculating the wind turbulence gives the same results as circstd function (scipy.stats.circstd). It has been also tested that the proposed change speeds up the calculation of the wind turbulence by a factor of 100.

Checklist: