ruiking04 / COCA

Deep Contrastive One-Class Time Series Anomaly Detection
30 stars 9 forks source link

Calculation of Standard Deviation #20

Closed marciahon29 closed 9 months ago

marciahon29 commented 10 months ago

Hello,

Please could you let me know how many seeds you ran in order to determine your standard deviation?

Did you run each model 5 times? 10 times?

Thanks

marciahon29 commented 9 months ago

Please let me know the number of times you run the code. I would like to also have a standard deviation as part of my results.

marciahon29 commented 9 months ago

Please could you also let me know how you calculate the "F1 Affiliation Score"?

ruiking04 commented 9 months ago

Sorry, I had a break for the New Year holiday some time ago.

5 times or 10 times are both fine. It depends on how rich your computing power is. The "F1 Affiliation Score" calculation process is as follows:

events_pred = convert_vector_to_events(predict) 
Trange = (0, len(predict)) 
dic = pr_from_events(events_pred, events_gt, Trange) 
affiliation_f1 = 2 * (dic["precision"] * dic["recall"]) / (dic["precision"] + dic["recall"])

code link Lines 80~83

marciahon29 commented 9 months ago

If I change the "seed" for the command prompts, does this mean that it is a different version of the model? I am finding that some values for seed_1 and seed_2 are producing the same results.

For example: python coca.py --selected_dataset IOpsCompetition --device cuda --seed 2 python coca_no_view.py --selected_dataset IOpsCompetition --device cuda --seed 1 python ts_tcc_main.py --training_mode anomaly_detection --selected_dataset IOpsCompetition --device cuda --seed 5

In addition, from my understanding, the baseline.py models are random for each execution. Is this correct? Or how could I specify that each execution are different? From my results, some of the runs are producing the same results.

Thanks, Marcia

marciahon29 commented 9 months ago

I believe that my different runs of baseline.py are producing the same values. How could I ensure that these values are slightly different? For example, with coca you have "seed". How do I do the same with baseline.py?

ruiking04 commented 9 months ago

Generally speaking, for deep models, the initial model is randomly generated at the beginning of each training. The curve of the loss function is bumpy, and optimizing from different positions will lead to randomness in the final trained model. For example: seed=1 and seed=2 may converge to two local optimal points, or they may converge to the same optimal point.

Your question: "I am finding that some values for seed_1 and seed_2 are producing the same results." It's possible.

And there are some models without randomness, such as the OC-SVM, and IF methods included in baseline.py. You can refer to Salesforce-Merlion's paper: https://arxiv.org/abs/2109.09265

Therefore, for methods in "baseline.py" that do not have a fixed random seed and have a certain degree of randomness, if you run it several times independently, the results should change. For example, you run the LSTMED method in "baseline.py" 5 times and then calculate the mean and standard deviation of the results.

python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
marciahon29 commented 9 months ago

How did you do the randomness for all models? In Table 2, all models have a standard deviation which means that they had different results for different runs.

Is there some script to modify? How do I accomplish this?

On Wed, Jan 3, 2024 at 5:51 AM RuiKing @.***> wrote:

Generally speaking, for deep models, the initial model is randomly generated at the beginning of each training. The curve of the loss function is bumpy, and optimizing from different positions will lead to randomness in the final trained model. For example: seed=1 and seed=2 may converge to two local optimal points, or they may converge to the same optimal point.

Your question: "I am finding that some values for seed_1 and seed_2 are producing the same results." It's possible.

And there are some models without randomness, such as the OC-SVM, and IF methods included in baseline.py. You can refer to Salesforce-Merlion's paper: https://arxiv.org/abs/2109.09265

Therefore, for methods in "baseline.py" that do not have a fixed random seed and have a certain degree of randomness, if you run it several times independently, the results should change. For example, you run the LSTMED method in "baseline.py" 5 times and then calculate the mean and standard deviation of the results.

— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1875176354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPV6RKMBSR4HF2XSU7LYMUZ4TAVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZVGE3TMMZVGQ . You are receiving this because you authored the thread.Message ID: @.***>

marciahon29 commented 9 months ago

From my results, it appears that IsolationForest, SpectralResidual, CPC, and OCSVM do not have randomness incorporated because my 3 runs have produced the same results.

RandomCutForest, LSTMED, DAGMM, and DeepSVDD do have different results in each of the 3 runs.

Please could you help me to get IsolationForest,SpectralResidual, CPC, and OCSVM to produce different results for each run?

ruiking04 commented 9 months ago

IsolationForest, SpectralResidual, and OCSVM inherently have no randomness, so they only need to be run once.

As for CPC, it should be random, we will verify it again. Please wait for a few days, as CPC runs slower.

marciahon29 commented 9 months ago

In your paper, you made all models random because each has a standard deviation.

Please could you let me know how to do this in the code?

On Thu, Jan 4, 2024 at 12:30 AM RuiKing @.***> wrote:

IsolationForest, SpectralResidual, and SVM inherently have no randomness, so they only need to be run once.

As for CPC, it should be random, we will verify it again. Please wait for a while, so CPC runs slower.

— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1876343655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPTK6CFFP3PF4DC4TA3YMY465AVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWGM2DGNRVGU . You are receiving this because you authored the thread.Message ID: @.***>

ruiking04 commented 9 months ago

OC-SVM and IF have no standard deviation.

image

marciahon29 commented 9 months ago

How were you able to get randomness in SpectralResidual? For me, it has no randomness.

ruiking04 commented 9 months ago

SR and SR-CNN are different, please read this paper for details.

Ren, Hansheng, et al. "Time-series anomaly detection service at microsoft." Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.

This article itself does not provide source code. The source code can be found at the link below.

https://paperswithcode.com/paper/time-series-anomaly-detection-service-at#code

marciahon29 commented 9 months ago

Hello,

In the results, the files specify "Spectral Residual". However, in your table it is "SR-CNN". I had understood them to be the same. If they are not the same, how did you get values for "SR-CNN"?

On Thu, Jan 4, 2024 at 2:08 PM RuiKing @.***> wrote:

SR and SR-CNN are different, please read this paper for details.

Ren, Hansheng, et al. "Time-series anomaly detection service at microsoft." Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.

This article itself does not provide source code. The source code can be found at the link below.

https://paperswithcode.com/paper/time-series-anomaly-detection-service-at#code

— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1877622408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPTOWL3IHUVHAV7IZO3YM344VAVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGYZDENBQHA . You are receiving this because you authored the thread.Message ID: @.***>

ruiking04 commented 9 months ago

Spectral Residual is a baseline provided by Merlion, which is different from SR-CNN. Of course, you can also use Spectral Residual. And there are other baselines in Merlion, such as WindStats, Autoencoder, and Variational Autoencoder.

paper: "Merlion: A Machine Learning Library for Time Series" 1704396447210

paper: "Time-series anomaly detection service at Microsoft" 1704396404534

marciahon29 commented 9 months ago

How did you get the results for SR-CNN that you have in your table? Because baseline.py only handles SpectralResidual.

On Thu, Jan 4, 2024 at 2:28 PM RuiKing @.***> wrote:

Spectral Residual is a baseline provided by Merlion, which is different from SR-CNN. Of course, you can also use Spectral Residual. And there are other baselines in Merlion, such as WindStats, Autoencoder, and Variational Autoencoder.

paper: "Merlion: A Machine Learning Library for Time Series" 1704396447210.png (view on web) https://github.com/ruiking04/COCA/assets/98844439/4801d529-9264-457a-8eaf-382bede7d1be

paper: "Time-series anomaly detection service at Microsoft" 1704396404534.png (view on web) https://github.com/ruiking04/COCA/assets/98844439/24a1b9e5-94c0-405d-8de0-73b49cbabc93

— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1877646198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPTJLDNQKJORVGXRW63YM37EHAVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGY2DMMJZHA . You are receiving this because you authored the thread.Message ID: @.***>

ruiking04 commented 9 months ago

The SR-CNN source code can be found at the link below.

https://paperswithcode.com/paper/time-series-anomaly-detection-service-at#code

ruiking04 commented 9 months ago

Sorry, our readme is indeed not clear, causing readers to mistakenly think that SR and SR-CNN are the same. We have made changes.

1704397271791

marciahon29 commented 9 months ago

Thank you, it makes more sense now and I was able to calculate both the mean and standard deviation.