Closed marciahon29 closed 9 months ago
Please let me know the number of times you run the code. I would like to also have a standard deviation as part of my results.
Please could you also let me know how you calculate the "F1 Affiliation Score"?
Sorry, I had a break for the New Year holiday some time ago.
5 times or 10 times are both fine. It depends on how rich your computing power is. The "F1 Affiliation Score" calculation process is as follows:
events_pred = convert_vector_to_events(predict)
Trange = (0, len(predict))
dic = pr_from_events(events_pred, events_gt, Trange)
affiliation_f1 = 2 * (dic["precision"] * dic["recall"]) / (dic["precision"] + dic["recall"])
code link Lines 80~83
If I change the "seed" for the command prompts, does this mean that it is a different version of the model? I am finding that some values for seed_1 and seed_2 are producing the same results.
For example: python coca.py --selected_dataset IOpsCompetition --device cuda --seed 2 python coca_no_view.py --selected_dataset IOpsCompetition --device cuda --seed 1 python ts_tcc_main.py --training_mode anomaly_detection --selected_dataset IOpsCompetition --device cuda --seed 5
In addition, from my understanding, the baseline.py models are random for each execution. Is this correct? Or how could I specify that each execution are different? From my results, some of the runs are producing the same results.
Thanks, Marcia
I believe that my different runs of baseline.py are producing the same values. How could I ensure that these values are slightly different? For example, with coca you have "seed". How do I do the same with baseline.py?
Generally speaking, for deep models, the initial model is randomly generated at the beginning of each training. The curve of the loss function is bumpy, and optimizing from different positions will lead to randomness in the final trained model. For example: seed=1 and seed=2 may converge to two local optimal points, or they may converge to the same optimal point.
Your question: "I am finding that some values for seed_1 and seed_2 are producing the same results." It's possible.
And there are some models without randomness, such as the OC-SVM, and IF methods included in baseline.py. You can refer to Salesforce-Merlion's paper: https://arxiv.org/abs/2109.09265
Therefore, for methods in "baseline.py" that do not have a fixed random seed and have a certain degree of randomness, if you run it several times independently, the results should change. For example, you run the LSTMED method in "baseline.py" 5 times and then calculate the mean and standard deviation of the results.
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
python baseline.py --dataset UCR --model LSTMED --debug &
How did you do the randomness for all models? In Table 2, all models have a standard deviation which means that they had different results for different runs.
Is there some script to modify? How do I accomplish this?
On Wed, Jan 3, 2024 at 5:51 AM RuiKing @.***> wrote:
Generally speaking, for deep models, the initial model is randomly generated at the beginning of each training. The curve of the loss function is bumpy, and optimizing from different positions will lead to randomness in the final trained model. For example: seed=1 and seed=2 may converge to two local optimal points, or they may converge to the same optimal point.
Your question: "I am finding that some values for seed_1 and seed_2 are producing the same results." It's possible.
And there are some models without randomness, such as the OC-SVM, and IF methods included in baseline.py. You can refer to Salesforce-Merlion's paper: https://arxiv.org/abs/2109.09265
Therefore, for methods in "baseline.py" that do not have a fixed random seed and have a certain degree of randomness, if you run it several times independently, the results should change. For example, you run the LSTMED method in "baseline.py" 5 times and then calculate the mean and standard deviation of the results.
— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1875176354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPV6RKMBSR4HF2XSU7LYMUZ4TAVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZVGE3TMMZVGQ . You are receiving this because you authored the thread.Message ID: @.***>
From my results, it appears that IsolationForest, SpectralResidual, CPC, and OCSVM do not have randomness incorporated because my 3 runs have produced the same results.
RandomCutForest, LSTMED, DAGMM, and DeepSVDD do have different results in each of the 3 runs.
Please could you help me to get IsolationForest,SpectralResidual, CPC, and OCSVM to produce different results for each run?
IsolationForest, SpectralResidual, and OCSVM inherently have no randomness, so they only need to be run once.
As for CPC, it should be random, we will verify it again. Please wait for a few days, as CPC runs slower.
In your paper, you made all models random because each has a standard deviation.
Please could you let me know how to do this in the code?
On Thu, Jan 4, 2024 at 12:30 AM RuiKing @.***> wrote:
IsolationForest, SpectralResidual, and SVM inherently have no randomness, so they only need to be run once.
As for CPC, it should be random, we will verify it again. Please wait for a while, so CPC runs slower.
— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1876343655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPTK6CFFP3PF4DC4TA3YMY465AVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWGM2DGNRVGU . You are receiving this because you authored the thread.Message ID: @.***>
OC-SVM and IF have no standard deviation.
How were you able to get randomness in SpectralResidual? For me, it has no randomness.
SR and SR-CNN are different, please read this paper for details.
Ren, Hansheng, et al. "Time-series anomaly detection service at microsoft." Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.
This article itself does not provide source code. The source code can be found at the link below.
https://paperswithcode.com/paper/time-series-anomaly-detection-service-at#code
Hello,
In the results, the files specify "Spectral Residual". However, in your table it is "SR-CNN". I had understood them to be the same. If they are not the same, how did you get values for "SR-CNN"?
On Thu, Jan 4, 2024 at 2:08 PM RuiKing @.***> wrote:
SR and SR-CNN are different, please read this paper for details.
Ren, Hansheng, et al. "Time-series anomaly detection service at microsoft." Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.
This article itself does not provide source code. The source code can be found at the link below.
https://paperswithcode.com/paper/time-series-anomaly-detection-service-at#code
— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1877622408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPTOWL3IHUVHAV7IZO3YM344VAVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGYZDENBQHA . You are receiving this because you authored the thread.Message ID: @.***>
Spectral Residual is a baseline provided by Merlion, which is different from SR-CNN. Of course, you can also use Spectral Residual. And there are other baselines in Merlion, such as WindStats, Autoencoder, and Variational Autoencoder.
paper: "Merlion: A Machine Learning Library for Time Series"
paper: "Time-series anomaly detection service at Microsoft"
How did you get the results for SR-CNN that you have in your table? Because baseline.py only handles SpectralResidual.
On Thu, Jan 4, 2024 at 2:28 PM RuiKing @.***> wrote:
Spectral Residual is a baseline provided by Merlion, which is different from SR-CNN. Of course, you can also use Spectral Residual. And there are other baselines in Merlion, such as WindStats, Autoencoder, and Variational Autoencoder.
paper: "Merlion: A Machine Learning Library for Time Series" 1704396447210.png (view on web) https://github.com/ruiking04/COCA/assets/98844439/4801d529-9264-457a-8eaf-382bede7d1be
paper: "Time-series anomaly detection service at Microsoft" 1704396404534.png (view on web) https://github.com/ruiking04/COCA/assets/98844439/24a1b9e5-94c0-405d-8de0-73b49cbabc93
— Reply to this email directly, view it on GitHub https://github.com/ruiking04/COCA/issues/20#issuecomment-1877646198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIJPTJLDNQKJORVGXRW63YM37EHAVCNFSM6AAAAABBG2RGDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGY2DMMJZHA . You are receiving this because you authored the thread.Message ID: @.***>
The SR-CNN source code can be found at the link below.
https://paperswithcode.com/paper/time-series-anomaly-detection-service-at#code
Sorry, our readme is indeed not clear, causing readers to mistakenly think that SR and SR-CNN are the same. We have made changes.
Thank you, it makes more sense now and I was able to calculate both the mean and standard deviation.
Hello,
Please could you let me know how many seeds you ran in order to determine your standard deviation?
Did you run each model 5 times? 10 times?
Thanks