jakubmicorek / MULDE-Multiscale-Log-Density-Estimation-via-Denoising-Score-Matching-for-Video-Anomaly-Detection

14 stars 0 forks source link

training and evaluation with video anomaly detection datasets #1

Closed syrobonkers closed 1 month ago

syrobonkers commented 1 month ago

Hello, Thank you for your excellent research and for making the code available as open-source software. I have tried to reproduce the experiment with your code and noticed that only the Toy Dataset is provided. Could you please provide guidance on how to train and evaluate the model on the Ped2, Avenue, and ShanghaiTech datasets mentioned in your paper?

I understand that for object-level tasks, I should use the pre-extracted features available in the repository. However, I am unsure how to utilize the three types of features—Pose, Deep, and Velocity. Could you explain how these features should be used? Additionally, could you provide information on how to perform the evaluation? If possible, it would be helpful if you could also add the training and evaluation code for these datasets in order to fully reproduce the experiment.

Thank you very much for your assistance.

jakubmicorek commented 1 month ago

Hello,

thank you for your interest in our research.

The code provided was intended to release the anomaly detection method where the same training and evaluation principles apply to the video features. Unfortunately, due to the extensive maintenance and refactoring required, I am currently unable to provide the complete dataset-specific training and evaluation scripts at this time. I may add the code in the future.

For the object-centric evaluation please find the details in section 2.2 of the supplementary material and have a look at the ablations in Table 3 and Table 4. The code provided by AccI-VAD does not handle the 1:1 mapping of pose with velocity and deep features. This means, that for each bounding box in a frame, the velocity and deep features align; however, the pose features within a frame do not have a direct mapping to the bounding box. For this reason, we decided to follow AccI-VAD's approach and fit/train separate models for each feature type. For training and evaluating we create a TensorDataset with one specific feature type and add the video name and the frame number such that the features can be back-referenced. During training, we additionally shuffle the TensorDataset. To obtain an anomaly score for a single frame, the maximum anomaly score per feature type within a frame is taken and aggregated. Note, as described in section 2.2 of the supplementary material, we standardize the anomaly score by the statistics of the training data by their respective feature type and clip negative values before adding up to obtain the final frame-level anomaly score. For each video, the aggregated anomaly scores are temporally smoothed with a 1d-Gaussian filter. We tested temporal smoothing of 1-7 frames. Finally, the micro and macro AUC-ROC scores are computed.

Please feel free to reach out if you need further assistance.

syrobonkers commented 1 month ago

Thank you. I was able to get it to work, and although the AUC is not perfect, it's close. I'm looking forward to the official evaluation code being released.

Haifu-Ye commented 1 week ago

谢谢。我能够让它工作,尽管 AUC 并不完美,但也已经很接近了。我期待官方评估代码的发布。

hello! May I ask how you use those three features? Right now there is only toy dataset,I can't reproduce the paper.I will be grateful if you can provide me the complete processing code for Avenue dataset.Please contact me -> yehaifu@cigit.ac.cn, thank you very much!

jpainam commented 1 week ago

@syrobonkers I'm also interested in the data processing code. Pl. let me know when you get it working.. @Haifu-Ye

Haifu-Ye commented 1 week ago

Okay, I'll keep you posted if I make any progress. But right now I'm still trying to get depth, pose features, etc. into the model ...

------------------ 原始邮件 ------------------ 发件人: "jakubmicorek/MULDE-Multiscale-Log-Density-Estimation-via-Denoising-Score-Matching-for-Video-Anomaly-Detection" @.>; 发送时间: 2024年8月13日(星期二) 晚上10:25 @.>; @.**@.>; 主题: Re: [jakubmicorek/MULDE-Multiscale-Log-Density-Estimation-via-Denoising-Score-Matching-for-Video-Anomaly-Detection] training and evaluation with video anomaly detection datasets (Issue #1)

@syrobonkers I'm also interested in the data processing code. Pl. let me know when you get it working.. @Haifu-Ye

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

syrobonkers commented 1 week ago

Hello, Unfortunately, I cannot share the complete code as it resides on a company PC.

When I tell you what I did, based on the repository owner’s comments, I added some code for preprocessing the features downloaded from the Accurate-Interpretable-VAD repository and post-processing for evaluation. I specifically worked on the ShanghaiTech dataset at the object level, evaluating the AUC for each modality (Pose, Deep Feature, Velocity) individually.

The AUCs I achieved didn’t perfectly match the results in the paper, but they were relatively close in trend. However, I did notice that the accuracy for some modalities decreased as training progressed, which made me question the correctness of my implementation. While I’m not entirely satisfied with the replication, I’ve gained a good enough understanding of the overall method, and sufficient for my purposes.

For those looking to replicate the evaluation results fully, I suggest sending a message to the repository owner for the official evaluation scripts. I also believe that many would benefit if the owner could release these scripts, and I would appreciate it if you could consider doing so. @jakubmicorek

Haifu-Ye commented 1 week ago

Hello, Unfortunately, I cannot share the complete code as it resides on a company PC.

When I tell you what I did, based on the repository owner’s comments, I added some code for preprocessing the features downloaded from the Accurate-Interpretable-VAD repository and post-processing for evaluation. I specifically worked on the ShanghaiTech dataset at the object level, evaluating the AUC for each modality (Pose, Deep Feature, Velocity) individually.

The AUCs I achieved didn’t perfectly match the results in the paper, but they were relatively close in trend. However, I did notice that the accuracy for some modalities decreased as training progressed, which made me question the correctness of my implementation. While I’m not entirely satisfied with the replication, I’ve gained a good enough understanding of the overall method, and sufficient for my purposes.

For those looking to replicate the evaluation results fully, I suggest sending a message to the repository owner for the official evaluation scripts. I also believe that many would benefit if the owner could release these scripts, and I would appreciate it if you could consider doing so. @jakubmicorek

Thank you for your answer! I will keep trying.