guanyilun / galilei

A generic function emulator that supports multiple backends
MIT License
1 stars 0 forks source link

Guidance on Using Package for Fluctuating Signature Data #15

Closed randomwangran closed 1 year ago

randomwangran commented 1 year ago

Description

pythonTest

What I Did

I need assistance in understanding the best conditions for using this package, specifically when applied to fluctuating signature data. I am unsure of the original or interpreted functions of this data and would like to know if an analytical solution can be provided for these points.

Here's one click code for you to get this plot.

import numpy as np
import matplotlib.pyplot as plt
import os
import tempfile
import requests
from scipy.signal import savgol_filter

# Input parameters
csv_url = 'https://gist.githubusercontent.com/randomwangran/a9bf1e4175c7b70be8b6c00459f8145d/raw/c2ab8f0b4fbed5cc88a630aab1f066568a902dd6/merged_section_S60.csv'
reference_section = 60
angle_threshold = np.deg2rad(45)  # Replace 45 with the desired angle threshold in degrees

# Download and save the CSV file
response = requests.get(csv_url)
temp_dir = tempfile.gettempdir()
csv_file_path = os.path.join(temp_dir, f'merged_section_S{reference_section}.csv')

with open(csv_file_path, 'wb') as f:
    f.write(response.content)

def load_section_data(file_path):
    data = np.genfromtxt(file_path, delimiter=',', skip_header=1)
    return data

# Load reference section data
ref_data = load_section_data(csv_file_path)
times = ref_data[:, 0]
values = ref_data[:, 1]

def hampel_filter(data, window_size, n_sigmas=3):
    filtered_data = np.copy(data)
    n = len(data)
    L = 1.4826  # Constant for Gaussian distribution

    for i in range(window_size, n - window_size):
        data_slice = data[i - window_size:i + window_size + 1]
        median = np.median(data_slice)
        mad = L * np.median(np.abs(data_slice - median))
        if np.abs(data[i] - median) > n_sigmas * mad:
            filtered_data[i] = median

    return filtered_data

# Apply the Hampel filter to remove outliers
window_size = 5
n_sigmas = 3
filtered_values = hampel_filter(values, window_size, n_sigmas)

# Apply the Savitzky-Golay filter for further smoothing
window_length = 51
polynomial_order = 3
filtered_values = savgol_filter(filtered_values, window_length, polynomial_order)

# Filter out the data points with time less than 36.5 seconds
time_threshold = 37
filtered_times = times[times > time_threshold]
filtered_values = filtered_values[times > time_threshold]

# Plot the data points and the filtered signal
plt.scatter(filtered_times, filtered_values, label='Data points', alpha=0.5)
plt.plot(filtered_times, filtered_values, color='red', label='Filtered signal')
plt.xlabel('Time')
plt.ylabel('Values')
plt.legend()
plt.show()
guanyilun commented 1 year ago

Thank you for trying galilei out, though I don't think it will provide what you want. It is meant to be a data-driven way to find an approximation (emulation) to your function without an explicit analytical expression. If you are interested in analytical expression, I recommend you looking into symbolic regression tools such as pysr. Galilei is useful if your function is expensive to calculate, as it provides a quick way to find a good approximation of it. I hope this is helpful!