[TadGAN benchmark] F1-Score is very low.

gunnha commented 7 months ago

What I Did

Tadgan Paper benchmark test for NASA dataset.
For example, you want to check with MSL data to see if it's close to 0.623 as in the paper.

known_anomalies = pd.DataFrame()
for signal in train_55:

    str1 = f'{signal}'
    df = load_anomalies(str1)

    known_anomalies = pd.concat([known_anomalies, df], axis=0)

# Merge signal
X_train_msl = pd.DataFrame()
X_test_msl = pd.DataFrame()
for signal in train_55:

    train_signal_path = f'multivariate/{signal}-train'
    test_signal_path = f'multivariate/{signal}-test'
    #train_signal_path = f'{signal}-train'
    #test_signal_path = f'{signal}-test'

    train_df = load_signal(train_signal_path)
    test_df = load_signal(test_signal_path)

    X_train_msl = pd.concat([X_train_msl, train_df], axis=0)
    X_test_msl = pd.concat([X_test_msl, test_df], axis=0)

hyperparameters = {
    "mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": {
        "time_column": "timestamp",
        "interval": 21600,
        "method": "mean"
    },
    "orion.primitives.tadgan.TadGAN#1": {
        "epochs": 70
    },
    "orion.primitives.tadgan.score_anomalies#1": {
        "rec_error_type": "dtw",
        "comb": "mult"
    }
}

orion = Orion(
    pipeline='tadgan',
    hyperparameters=hyperparameters
)

orion.fit(X_train_msl)
anomalies = orion.detect(X_test_msl)

contextual_f1_score(known_anomalies, anomalies, X_test_msl)

Question

The F1 score is very low. What is the reason?
Can't I do it this way?

sarahmish commented 7 months ago

Hi @gunnha – did you find the answers you're looking for?

gunnha commented 7 months ago

Hi @gunnha – did you find the answers you're looking for?

I thought I found it at first, so I closed it. I'm still looking for it...

I found that Variation changes can be handled like the code above. The only concern now is the f1-score.

There are 55 channels of MSL data (ex. P-11, D-15, M-7). I merged all these channels into one DataFrame. Using the code above, the f1-score is approximately 0.1.

Secondly, I tried to take f1-score by fitting on individual channels, and some of them came out as nan. Instead, some of them came out as well as papers.

The code above is a code that processes 55-column channels by merging them into a train_55 list.

Is there any other good way? @sarahmish Thank you for your interest.

How to load Other benchmark (ex. Yahoo S5 A1)'s load_anomalies?

sarahmish commented 7 months ago

@gunnha to reproduce the results in the paper, we use the benchmark function provided in Orion. We do a couple of things differently there:

we process each signal on its own (no concatenation).
we use weighted=False option for the evaluation metrics.
we aggregate the results on a dataset level.

As for the Yahoo data, you need to directly request access from their website to obtain their data.

The code for reproducing the benchmark can be found in benchmark.py and to aggregate the results refer to results.py.

Let me know if you have any further questions!

sintel-dev / Orion

[TadGAN benchmark] F1-Score is very low. #486

What I Did

Question