mesarcik / ROAD

The Radio Observatory Anomaly Detection (ROAD)
MIT License
1 stars 0 forks source link

Experimentation #2

Closed mesarcik closed 1 year ago

mesarcik commented 2 years ago

Dataset information

Dataset overview

1) Note, these values are changing as more corrections are made to each class _2) Note, the class labels are not necessarily correct, for examples, what I labelled strong radio emitter is an A-team source in the sidelobes__

Class # Samples
Other 6953
scintillation 2991
strong_radio_emitter 2444
unknown 976
high_noise_elements 790
lightning 727
data_loss 462
solar_storm 147
electric_fence 142
oscillating_tile 82
empty 73

Labelling interface/the way things are plotted

Potential directions

Option 1:

Option 2:

Option 3:

Task
Predict Time/Frequency range of each patch
Predict station each patch corresponds to
Predict polarisation
Predict antenna locations
Patch locations
mesarcik commented 2 years ago

Preliminary experiments

mesarcik commented 2 years ago

Self-supervised learning example

Predict frequency band and polarisation for each patch

Method

mesarcik commented 2 years ago

Problems with normalisation

Low band with data loss

Normalised

L774495_SAP000_CS011LBA_autocorrelation_SB_vs_time_normalized_amplitude

Not

L774495_SAP000_CS011LBA_autocorrelation_SB_vs_time_normalized_amplitude png - norm

High band

Normalised

L790158_SAP000_RS307LBA_autocorrelation_SB_vs_time_normalized_amplitude

Not

L790158_SAP000_RS307LBA_autocorrelation_SB_vs_time_normalized_amplitude png - norm

Uncertain features

L784965_SAP001_CS004LBA_autocorrelation_SB_vs_time_normalized_amplitude png - norm L784965_SAP001_CS005LBA_autocorrelation_SB_vs_time_normalized_amplitude

mesarcik commented 2 years ago

Validate labels:

TODO:

Notes:

mesarcik commented 2 years ago

Sample of the dataset clipped at 99th percentile

temp

mesarcik commented 1 year ago

Initial results

AUROC

Class AUROC AUPRC F1
oscillating_tile 0.8274 0.8877 0.8777
electric_fence 0.2860 0.4553 0.6686
data_loss 0.5732 0.6357 0.6985
lightning 0.3430 0.4748 0.6447
strong_radio_emitter 0.6264 0.2634 0.3708

Discussion

Reconstructions of KNNs

temp

mesarcik commented 1 year ago

Using SSL

Training:

Results:

_temp temp

Embedding for Dataloss

epoch_embedding_0

TODO

mesarcik commented 1 year ago

Update after 1 week

1) Trying to determine the effects of Multi-class/single-class 2) The differences in performance between representations learnt from resnet, vae with and without training 3) The effect of evaluation metric on performance measurement

Multi-class detection

temp_MISO

Single Class

Open questions:

mesarcik commented 1 year ago

TODO

Weird results:

Anomaly Type AUROC AUPRC F1
oscillating_tile 0.1715 0.6866 0.8780
electric_fence 0.8257 0.9395 0.8852
data_loss 0.4460 0.6216 0.7984
lightning 0.5778 0.6737 0.8147
strong_radio_emitter 0.3396 0.5647 0.7899
solar_storm 0.9457 0.9018 0.9800
mesarcik commented 1 year ago

Changed encoding scheme

Station names

Frequency range:

Polarisations:

Different anomaly detection evaluation:

What other self-supervised labels can we use?

Other options:

mesarcik commented 1 year ago

Comparison between distance based metric and frequency band information:

Anomaly Pixel-mean VAE-dist Res-dist Freq-dev Supervised Location prediction (patch size 64x64)
oscillating_tile 0.1891 0.5793 0.7972 0.7296 0.4669 0.71428
electric_fence 0.2914 0.2988 0.3799 0.3647 0.4669 -
data_loss 0.4867 0.6491 0.6335 0.7034 0.4669 0.6691
lightning 0.6714 0.6515 0.6595 0.6568 0.4816 0.7062
strong_radio_emitter 0.7112 0.7689 0.8151 0.7647 0.9252 0.8575
solar_storm 0.7989 0.9355 0.5860 0.9295 0.5498 0.8211

Location prediction (spatial context prediction)

Issues with current implementation

mesarcik commented 1 year ago

Frequency + Neighbour SSL

epoch_embedding_100

Debugging process:

Problems with learning position:

Using Frequency band information seems to decrease overall performance

Possible solutions

mesarcik commented 1 year ago

Data vs. Model problems:

Including normal test data into training

Anomaly Location prediction (original training data) Location prediction (modified training data) VAE
oscillating_tile 0.71428 0.91612 0.90123
electric_fence 0.2885 0.90196 0.8956
data_loss 0.6763 0.95384 0.953846
lightning 0.71225 0.976870 0.97297
strong_radio_emitter 0.83727 0.98795 0.987991
solar_storm 0.9006 1.0 0.97350
Anomaly Location prediction VAE
oscillating_tile 0.9006 0.7515
electric_fence 0.5664 0.4554
data_loss 0.8761 0.78758
lightning 0.8539 0.77378
strong_radio_emitter 0.9449 0.8861
solar_storm 1.0 0.92358

Further analysis of amount of data

temp

Anomaly Untrained Trained
oscillating_tile 0.791367 0.698795
data_loss 0.672432 0.714545
lightning 0.673568 0.695421
strong_radio_emitter 0.838039 0.844560
solar_storm 0.262452 0.860681

Training effects per feature

temp

mesarcik commented 1 year ago

Model diagram for explanation

RAAD

Augmentations:

Augmentations to try

RFI removal

mesarcik commented 1 year ago

State of affairs

Code

Data

Classes to correct

Model

Inspection of outputs and debugging

Updating losses:

Things to try

Existing hyperparamerters:

Parameter Description
c clip amount of the training data
p patch size
lambda regularisation scaling factor
l embedding dimension
m number of layers of MLP
j jitter amount
r rolll amount
M backbone network (resnet15, resnet50, ViT)

Parameter Sweeps

mesarcik commented 1 year ago

Dataset bug found:

mesarcik commented 1 year ago

Fine tuning

Results:

temp

Comparison with KNN

Class SSL + KNN SSL + Fine-tuning Random init + Fine-tuning
oscillating_tile 0.7596 0.5454 0.4687
data_loss 0.6100 0.77419 0.76546
lightning 0.8022 0.8316 0.6655
strong_radio_emitter 0.8414 0.8210 0.8283
solar_storm 0.78287 0.9897 0.9847
mean 0.7606 0.7964 0.7464

Conclusions

More Ideas:

mesarcik commented 1 year ago

Updates

Clip amount vs. average F1-Score

temp

Class SSL + KNN SSL + Fine-tuning Random init + Fine-tuning
oscillating_tile 0.7826
data_loss 0.7104
lightning 0.8089
strong_radio_emitter 0.9133
solar_storm 0.9898
mean 0.8410

Intermediate results:

Going through the incorrectly classified examples:

Class # Misclassified samples
oscillating_tile 2
data_loss 91
lightning 54
strong_radio_emitter 13
solar_storm 0

Oscillating Tile:

1200

Data loss:

1355

Lightning

Strong radio emitter:

mesarcik commented 1 year ago

Labelling update

mesarcik commented 1 year ago

Code updates:

TODO:

mesarcik commented 1 year ago

State of affairs

Amount of training data:

Data imbalance in the test set

Class # Samples % Contamination
training data (normal) 2533 -
test data (normal) 800 -
Total data ~6500 -
data_loss 413 6%
electric_fence 62 1%
lightning 327 5%
oscillating_tile 57 1%
real_high_noise 869 13%
solar_storm 147 2%
strong_radio_emitter 1334 20%

Back of the envelope calculations in favour of subsampling

Name # Samples
1 Subband observations ~2000
Unknown data (not normal, but no characterised anomaly) ~1500
Normal ~3500
Anomalies ~3000
Unlabelled ~2000
mesarcik commented 1 year ago

Post-Holiday To-do: