timothyyu / wsae-lstm

implementation of WSAE-LSTM model as defined by Bao, Yue, Rao (2017)
Other
77 stars 30 forks source link

"level" parameter in waveletSmooth function #9

Open danizil opened 5 years ago

danizil commented 5 years ago

Hi Timothy. In reviewing your code I ran into issues using the waveletSmooth function (in the directory subrepos/models/wavelet). I think it might be the difference in our pywt versions, but the function was doing the wavelet decomposition along the features axis and not separately for each feature along its time series. After fixing it, I noticed that the "level" parameter was only in charge of thresholding the detail coefficients using the "level" detail coefficient's median. I'm hardly a wavelet expert, and have learned it only now for this algorithm, but I changed your code to threshold coefficients according to their level's median, because that was done in all denoising sources I have seen. Could you explain your consideration in choosing one level for all cD thresholding? cheers! Danny

timothyyu commented 5 years ago

@danizil the waveletSmooth function in the subrepos/models/wavelet directory is from DeepLearning_Financial, a previous attempt to replicate the results of the paper: (https://github.com/mlpanda/DeepLearning_Financial)

I am currently using a modified implementation of that formula, seen here: https://github.com/timothyyu/wsae-lstm/blob/master/wsae_lstm/models/wavelet.py#L27

timothyyu commented 5 years ago

image

timothyyu commented 5 years ago

level is 2 as defined in the original paper; as for the axis, I am still looking into how that specific step is tied to the next level of the model (the stacked autoencoder stage). I am fairly confident that my implementation is on the right track axis wise, but I am not infallible. I do recall the feature set being seemingly incorrectly oriented when using axis=1, but it is something that I will have to double check

Related/relevant: https://github.com/timothyyu/wsae-lstm/issues/7 https://github.com/timothyyu/wsae-lstm/issues/6 https://github.com/timothyyu/wsae-lstm/blob/master/reports/csi300%20index%20data%20tvt%20split%20scale%20denoise%20visual.pdf

danizil commented 5 years ago

Hi @timothyyu

  1. What puzzles me is not the decomposition level, which can be toggled, but the fact that we take the threshold from the "level" level's median coefficient and apply it to all other levels. That way did work better at reconstruction though so I went on without further exploration.

  2. Regarding the axis, the way I understood it is that we're supposed to compress nineteen indicators into ten features' time series' (i.e. on the indicator axis and not on the time axis), and then run the compressed features through an lstm. I imagine that whichever denoising process transforms the dataframe into 19X(DecLvl + 1) is preformed on the correct axis (spectral decomposition is relevant only on the time axis is what I mean). I'll add my own code underneath, could be that it's a package version thing.

  3. After exploring for a bit, and not being able to converge with the AE, I think ill try a new scaling process which compresseses the data to the range (0, 1). The reason is mainly that I wasn't able to figure out how to recreate the scaled original signal with ReLUs or tanhs without using another linear transformation in the end, thereby breaking the symmetry of the AE. Other resons are that this will make it possible to use sigmoids as in the paper and also I saw it used in AE tutorials: https://www.youtube.com/watch?v=582irhtQOhw on minute 11:39 https://medium.com/datadriveninvestor/deep-autoencoder-using-keras-b77cd3e8be95 I think this is the subject for another issue though, and will update on it. Here is my code for the wavelet function (besides my comments, that I think explain the process though are verbose, there are the two options and a transpose on X): Cheers!

    
    def waveletSmooth(x, wavelet="db4", level=1, DecLvl=2):
    # calculate the wavelet coefficients
    # danny: coeffs is (DecLvl + 1) arrays: one approximation coefficients (cA) array (lowest frequencies)
    # and then DecLvl number of detail coefficients (cD)
    coeffs = pywt.wavedec(x.T, wavelet, mode="per", level=DecLvl)
    
    # calculate a threshold
    # danny: mad is median deviation (not standard deviation)
    sigma = mad(coeffs[-level]) #danny: should be shape 2X19. this is the original but i turned it off
    #danny: option 2 - scale each cD by its own median
    # sigma = np.array([mad(i) for i in coeffs[1:]])
    
    # changing this threshold also changes the behavior,
    # but I have not played with this very much
    # danny: uthresh is universal threshold - a formula appearing in articles (haven't gone into it)
    uthresh = sigma * np.sqrt(2 * np.log(len(x)))
    
    # danny: we take [1:] because these are the detail coefficients and we denoise them
    coeffs[1:] = (pywt.threshold(coeffs[1:][i], value=uthresh[i], mode="soft") for i in range(len(coeffs[1:])))
    # reconstruct the signal using the thresholded coefficients
    y = pywt.waverec(coeffs, wavelet, mode="per")
    return y```
timothyyu commented 5 years ago

related closed issue (duplicate): https://github.com/timothyyu/wsae-lstm/issues/12

timothyyu commented 5 years ago

@danizil make sure the wavelet type in your code is haar, not db4. The authors for the WSAE-LSTM specifically specify haar; the existing/previous attempt to implement this model for the wavelet stage by mlpanda uses db4, but that is incorrect

I am still looking into/examining the level median application/decomposition (#1) + the axis orientation (#2); one of main issues I'm running into is that the authors of the model were not very specific when it comes to particular aspects of the implementation of their model (see #6 and #7 for relevant discussion regarding that).

Basically beyond a certain point, the highest academic judgement/practice should be used to fill in the gaps in the implementation of the model + correction of errors.

timothyyu commented 5 years ago

3. After exploring for a bit, and not being able to converge with the AE, I think ill try a new scaling process which compresseses the data to the range (0, 1). The reason is mainly that I wasn't able to figure out how to recreate the scaled original signal with ReLUs or tanhs without using another linear transformation in the end, thereby breaking the symmetry of the AE.

That is one of the fundamental issues that I am looking into - Whether the scaling and denoising with the wavelet transform is reversed at some stage before/after the LSTM layer, used to make predictions one timestep ahead: image

I can't say I have a definitive answer/solution yet, but I'm going to be trying more than one method/approach. Unfortunately, the actual journal does not explicitly detail this component/issue in defining the model & the model pipeline for the price data + technical indicator data.

timothyyu commented 5 years ago

From the pywavelets documentation, p.38-39: image

timothyyu commented 5 years ago
  • Regarding the axis, the way I understood it is that we're supposed to compress nineteen indicators into ten features' time series' (i.e. on the indicator axis and not on the time axis), and then run the compressed features through an lstm. I imagine that whichever denoising process transforms the dataframe into 19X(DecLvl + 1) is preformed on the correct axis (spectral decomposition is relevant only on the time axis is what I mean). I'll add my own code underneath, could be that it's a package version thing.

Axis decomposition check started; see 707dfb54ac61e78d53d33fe4fd6e7fcb57cf58e8: https://github.com/timothyyu/wsae-lstm/blob/master/notebooks/6a_wavelet_axis_decomp_check.ipynb

image image

timothyyu commented 5 years ago

axis = 1 wavelet has an extra y axis column that has to be removed to be accurate feature wise: image

timothyyu commented 5 years ago

will have to double check but i believe i was correct initially with axis=0 - still may be worth running in parallel with axis=1 with the extra column chopped off as an A/B control or test

image

Regarding the axis, the way I understood it is that we're supposed to compress nineteen indicators into ten features' time series' (i.e. on the indicator axis and not on the time axis), and then run the compressed features through an lstm. I imagine that whichever denoising process transforms the dataframe into 19X(DecLvl + 1) is preformed on the correct axis (spectral decomposition is relevant only on the time axis is what I mean). I'll add my own code underneath, could be that it's a package version thing.

The authors of the original paper were not explicitly clear or detailed for this aspect of the model - will also have to take a closer look at/reevaluate the autoencoder stage in how it is supposed to work on the transformed data (19X DecLvl+1) before LSTM input