gregversteeg / gaussianize

Transforms univariate data into normally distributed data
MIT License
73 stars 25 forks source link

Data not transformed #4

Closed ghost closed 1 year ago

ghost commented 4 years ago

I tried to transform my data to gaussian distribution and the output of the transform function is the same as the original data. I created this script to run some tests. Thank you in advance. Test script:

import gaussianize as g
import numpy as np

x_uni = np.random.uniform(size=(1000,1))
out_uni = g.Gaussianize()
out_uni.fit(x_uni)
print("out_uni coefficients:", out_uni.coefs_, "\n")
y_uni = out_uni.transform(x_uni)
print("sum of transform minus original uniform:", sum(y_uni-x_uni), "\n")

x_norm = np.random.normal(size=(1000,1))
out_norm = g.Gaussianize()
out_norm.fit(x_norm)
print("out_norm coefficients:", out_norm.coefs_, "\n")
y_norm = out_norm.transform(x_norm)
print("sum of transform minus original normal:", sum(y_norm-x_norm), "\n")

import scipy.io as scio
import os

midep1 = scio.loadmat(os.path.join("D:\\", "Shared", "Data - sLorTimeseries", "LorTimeSeries_2_EC_MID_001_ep1.mat"))["LoretaTimeSeries"]
x = midep1[0, :]
x.shape = (4001, 1)
out= g.Gaussianize()
out.fit(x)
print("my data coefficients:", out.coefs_, "\n")
y = out.transform(x)
print("tranform minus original of my data:")
print(y-x, "\n")
print("sum of transform minus original:", sum(y-x), `"\n")

Output:

out_uni coefficients: [(0.4964280888006252, 0.2880773918099591, 0.0)] 
sum of transform minus original uniform: [-8.04911693e-16] 
out_norm coefficients: [(-0.013567057474285795, 0.9626024748274751, 0.013488)] 
sum of transform minus original normal: [-0.15229739] 
my data coefficients: [(2.8030224, 1.3002187, 0.0)] 
tranform minus original of my data:
[[ 0.0000000e+00]
 [ 0.0000000e+00]
 [-1.3737008e-08]
 ...
 [ 0.0000000e+00]
 [ 0.0000000e+00]
 [ 0.0000000e+00]] 
sum of transform minus original: [1.0423828e-06] `

The fit of x_uni has different coefficients compared to x_norm but y_uni and y_norm are equal to x_uni and x_norm respectively. In my understanding the transformed signal should scale to resemple a gaussian distribution. Am I missing something or not running the transforms correctly?

gregversteeg commented 4 years ago

Hmm, I'm worried that it's gaussianizing each row separately (i.e., each row is considered a single sample of some 1-d data). Can you try transposing the data matrix and see if it works?

ghost commented 4 years ago

The result is the same, but now the coefficients in object out are a list of 4001 tuples of size 3. What I keep seeing is that the 3rd value in the tupple is always equal to zero. Maybe this is the problem?

As you can see in the picture below, my data (x) are somewhat close to gaussian, but zero change (in y) seems strange to me. εικόνα

Maybe I am missing something so I am going to be more clear: I have a timeseries of 4001 points and I want to gaussianize it. I expect the values close to 6 and 7 to contract and the values around 0 and 1 to dialate so that my above histogram will transform into more gaussian-like. Is this the intent of your class?

gregversteeg commented 4 years ago

Hi Nikos, you are correct in interpreting the intent of the class. I'm afraid I haven't used it in a long time and don't remember much about the details. Let me make a few observations though.

Sorry I can't be of more help, this was kind of a weekend project from a few years ago and I haven't really used it much myself since then. If you do figure anything out, please comment back here in case it's useful for others trying to do this type of Gaussianization.

gmgeorg commented 4 years ago

Hi @Nikos-T , FWIW looking at your histogram chart it does not seem like your data even has heavy tails (its a truncated distribution on the left, and on the right it's barely heavy tailed). I wouldnt be surprised if 'delta' is estimated at 0. What's the kurtosis / skewness of your data in the histogram?

ghost commented 1 year ago

Hello, unfortunately, I do not remember neither my use-case, nor its data. However re-reading my problem, I suppose you are correct and I probably started with a non-skewed distribution, or I misinterpreted the scope of this "gaussanization". I am closing the issue, thanks for the help.