Closed dostuffthatmatters closed 1 year ago
It is important, that even though resampling should work at higher rates than the data input, there should be no resampling in input data gaps above a certain threshold (e.g. 1 minute).
Example, with i = input
and o = output
when resampling at twice the input data rate:
i i - i - - - i i i i
o o o o o o o - - - - - - - o o o o o o o
I could see a Python data class object be helpful for overall readability at the end. A lot of operations could be moved to an internal function like for example the python list object.
Hi @vyasgiridhar!
You can find an example for how to group measurements by minute and average them here (last line): https://github.com/tum-esm/automated-retrieval-pipeline/blob/df0d27ef843d14bd5f7fee124c41b937280fe098/extract-retrieval-data/src/procedures/read_from_database.py#L25-L90
However, this only works when the resampling rate is lower than the data rate. There has to be some interpolation in between. You don't have to write an interface to the database or any big codebase setup - only figure out a way for resampling incl. interpolation.
Best, Moritz
Hi, @vyasgiridhar (and @dostuffthatmatters),
I've refactored the repo up to a stage, where I now need to consult with @dostuffthatmatters about the next steps. Regarding the post-processing, as of now, you need to implement the following to resolve this issue. It's the same idea as above, just in an isolated function (so you don't need to worry about the DB request).
Best, Marlon
Our retrieval produces data like in the image below. For this data to be easily used by other models, we have to post-process it. A good example of the curves resulting from post-processing can be seen here. The raw data points are smoothed with a function similar to a rolling average and the output data is generated at a fixed rate, i.e. one output data point exactly every x seconds.
We need a new post-processing algorithm since the current one is way too complicated and error-prone. The goal is a function that takes in a pandas data frame with pairs of raw
time
andconcentration
values and a rate at which to resample the smoothed data and return a data frame in the same format:You can use the sample data below (same as in the image).
An example resampling implementation can be found here. However, this does not interpolate between data points - with data every 6 seconds resampling at 2 seconds will result in data gaps even though there were no recording gaps.
As a smoothing function (applied before resampling) you can use
scipy.signal.savgol_filter(array, frames, order)
withframes = 31; order = 3
.proffast-2.2-outputs-20220604.zip