kupl / adapt

ADAPT is the open source white-box testing framework for deep neural networks
MIT License
21 stars 5 forks source link

The Result of Adapt in vgg #13

Closed Kolt1911 closed 1 year ago

Kolt1911 commented 2 years ago

I tried to use test_vgg19.ipynb to run experiment for testing vgg19. But the performance is worse than the examples in the paper, such as "total adversarials" and "total inputs". But I didn't modify any files or arguments in the code of adapt. BTW, the performance of experiment for testing mnist is the same as the examples. I don't know the reason about this problem.

the output of code "archives_adapt[0].summary()" in my testing: ` Total inputs: 10 Average distance: 0.001166449161246419 Total adversarials: 0 Average distance: - Coverage Original: 0.04568829113924051 Achieved: 0.04720464135021097

Original label: Pomeranian Count: 414 Average distance: 0.016496244817972183

Total inputs: 413 Average distance: 0.002656622789800167 Total adversarials: 0 Average distance: - Coverage Original: 0.04562236286919831 Achieved: 0.05195147679324894 `

And the example is:

` Total inputs: 8471 Average distance: 0.02087555266916752 Total adversarials: 1135 Average distance: 0.0999956950545311 Coverage Original: 0.04555643459915612 Achieved: 0.15928270042194093

Original label: Pomeranian Count: 7336 Average distance: 0.008634363301098347

Label: hyena Count: 217 Average distance: 0.05843096598982811

Label: meerkat Count: 255 Average distance: 0.07762718945741653

Label: tick Count: 224 Average distance: 0.10655633360147476

Label: guinea_pig Count: 33 Average distance: 0.09237903356552124

Label: English_setter Count: 44 Average distance: 0.10023358464241028

Label: Petri_dish Count: 6 Average distance: 0.10553994029760361

Label: flatworm Count: 343 Average distance: 0.14118780195713043

Label: shower_cap Count: 13 Average distance: 0.04866204783320427 `

henrylee97 commented 1 year ago

I tried with my local machine without GPU and the results are the following:

>>> archives_adapt[0].summary()
----------
Total inputs: 1296
  Average distance: 0.010443303734064102
Total adversarials: 316
  Average distance: 0.01583455689251423
Coverage
  Original: 0.045490506329113924
  Achieved: 0.08511339662447258
----------
Original label: suit
  Count: 980
  Average distance: 0.008704898878932
----------
Label: stole
  Count: 316
  Average distance: 0.01583455689251423
----------

I am not sure why Adapt only generates 400 examples. One possible reason is that GPU is not correctly set, and TF cannot use it for acceleration.

For your information, results in MNIST tutorial are written without GPU.

Kolt1911 commented 1 year ago

I tried with my local machine without GPU and the results are the following:

>>> archives_adapt[0].summary()
----------
Total inputs: 1296
  Average distance: 0.010443303734064102
Total adversarials: 316
  Average distance: 0.01583455689251423
Coverage
  Original: 0.045490506329113924
  Achieved: 0.08511339662447258
----------
Original label: suit
  Count: 980
  Average distance: 0.008704898878932
----------
Label: stole
  Count: 316
  Average distance: 0.01583455689251423
----------

I am not sure why Adapt only generates 400 examples. One possible reason is that GPU is not correctly set, and TF cannot use it for acceleration.

For your information, results in MNIST tutorial are written without GPU.

Thanks for your reply!

I checked the status of gpu when running the experiment. It did use the gpu because of high gpu memory usage and power. The env of gpu is probably compatible. Could you give some proposals about what situation the gpu is not correctly set for TF acceleration?

Also, I had tried to set longer fuzzing time from 20min to 60min, but it didn't reach the performance in gpu, either.

henrylee97 commented 1 year ago

I really don't have any idea. How about other GPU-based applications, like training MNIST model from scratch? It takes about 1 second per iteration with colab GPU, and took about 2-3 seconds per iteration with my local machine with RTX 2080.

Try: https://colab.research.google.com/github/kupl/tutorial/test_lenet5.ipynb

Kolt1911 commented 1 year ago

I really don't have any idea. How about other GPU-based applications, like training MNIST model from scratch? It takes about 1 second per iteration with colab GPU, and took about 2-3 seconds per iteration with my local machine with RTX 2080.

Try: https://colab.research.google.com/github/kupl/tutorial/test_lenet5.ipynb

Thanks. I'll try to use colab to rerun the experiment.

henrylee97 commented 1 year ago

Thanks. I'll try to use colab to rerun the experiment.

If you find any problem, please let me know.

Kolt1911 commented 1 year ago

Thanks. I'll try to use colab to rerun the experiment.

If you find any problem, please let me know.

Hello. I appreciate your help. I have tried to rerun the experiment on the colab with tesla t4 gpu. Unfortunately, the performance is the same as what I had done, that the quantities of adversarial samples is great less than the paper. The following is the output of 10 images. Does anything I could do next?

----------
Total inputs: 13297
  Average distance: 0.002051855204626918
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.045490506329113924
  Achieved: 0.05775316455696203
----------
Original label: Gordon_setter
  Count: 13297
  Average distance: 0.002051855204626918
----------
----------
Total inputs: 13225
  Average distance: 0.010842270217835903
Total adversarials: 889
  Average distance: 0.06209832429885864
Coverage
  Original: 0.04588607594936709
  Achieved: 0.13337289029535865
----------
Original label: dingo
  Count: 12336
  Average distance: 0.007148476783186197
----------
Label: hyena
  Count: 149
  Average distance: 0.04001113399863243
----------
Label: Ibizan_hound
  Count: 316
  Average distance: 0.04688433185219765
----------
Label: basenji
  Count: 61
  Average distance: 0.06257662177085876
----------
Label: Italian_greyhound
  Count: 156
  Average distance: 0.07354681938886642
----------
Label: computer_keyboard
  Count: 12
  Average distance: 0.07971850037574768
----------
Label: shoe_shop
  Count: 195
  Average distance: 0.09323695302009583
----------
----------
Total inputs: 13108
  Average distance: 0.005441570654511452
Total adversarials: 526
  Average distance: 0.03273507580161095
Coverage
  Original: 0.045490506329113924
  Achieved: 0.11438554852320675
----------
Original label: suit
  Count: 12582
  Average distance: 0.00430054496973753
----------
Label: stole
  Count: 526
  Average distance: 0.03273507580161095
----------
----------
Total inputs: 13171
  Average distance: 0.002221100265160203
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.045490506329113924
  Achieved: 0.05386339662447257
----------
Original label: freight_car
  Count: 13171
  Average distance: 0.002221100265160203
----------
----------
Total inputs: 13204
  Average distance: 0.006522973533719778
Total adversarials: 112
  Average distance: 0.10492376238107681
Coverage
  Original: 0.045490506329113924
  Achieved: 0.10884757383966245
----------
Original label: matchstick
  Count: 13092
  Average distance: 0.005681169684976339
----------
Label: thatch
  Count: 101
  Average distance: 0.10867155343294144
----------
Label: sea_urchin
  Count: 11
  Average distance: 0.07051229476928711
----------
----------
Total inputs: 13259
  Average distance: 0.007664752658456564
Total adversarials: 111
  Average distance: 0.04815076291561127
Coverage
  Original: 0.045490506329113924
  Achieved: 0.11900052742616034
----------
Original label: English_foxhound
  Count: 13148
  Average distance: 0.007322955876588821
----------
Label: whippet
  Count: 5
  Average distance: 0.07180314511060715
----------
Label: Saluki
  Count: 106
  Average distance: 0.047035083174705505
----------
----------
Total inputs: 13020
  Average distance: 0.002887626877054572
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.04614978902953586
  Achieved: 0.07561972573839662
----------
Original label: prayer_rug
  Count: 13020
  Average distance: 0.002887626877054572
----------
----------
Total inputs: 13037
  Average distance: 0.009038560092449188
Total adversarials: 1021
  Average distance: 0.050655242055654526
Coverage
  Original: 0.04555643459915612
  Achieved: 0.1785337552742616
----------
Original label: house_finch
  Count: 12016
  Average distance: 0.005502388346940279
----------
Label: hen
  Count: 35
  Average distance: 0.05718062445521355
----------
Label: partridge
  Count: 438
  Average distance: 0.03958751633763313
----------
Label: rock_python
  Count: 39
  Average distance: 0.021582739427685738
----------
Label: boa_constrictor
  Count: 46
  Average distance: 0.022036049515008926
----------
Label: king_snake
  Count: 53
  Average distance: 0.02394760027527809
----------
Label: quail
  Count: 45
  Average distance: 0.05112099274992943
----------
Label: puffer
  Count: 5
  Average distance: 0.04790183901786804
----------
Label: electric_ray
  Count: 197
  Average distance: 0.06255562603473663
----------
Label: honeycomb
  Count: 123
  Average distance: 0.09312015026807785
----------
Label: window_screen
  Count: 40
  Average distance: 0.07341377437114716
----------
----------
Total inputs: 13442
  Average distance: 0.0033626507502049208
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.04568829113924051
  Achieved: 0.0691587552742616
----------
Original label: grasshopper
  Count: 13442
  Average distance: 0.0033626507502049208
----------
----------
Total inputs: 13335
  Average distance: 0.01175614446401596
Total adversarials: 762
  Average distance: 0.09106249362230301
Coverage
  Original: 0.045490506329113924
  Achieved: 0.14425105485232068
----------
Original label: Pomeranian
  Count: 12573
  Average distance: 0.006949699949473143
----------
Label: cairn
  Count: 495
  Average distance: 0.06978417187929153
----------
Label: feather_boa
  Count: 213
  Average distance: 0.12915335595607758
----------
Label: wig
  Count: 26
  Average distance: 0.11854938417673111
----------
Label: sea_urchin
  Count: 28
  Average distance: 0.15194664895534515
----------
plt.plot(times, coverages, label='adapt')\ncoverages = []\nfor archive in archives_rand:\n    coverage = []\n    timestamp = iter(archive.timestamp)\n    t, cov = next(timestamp)\n    for current_t in times:\n        while current_t > t:\n            t, cov = next(timestamp)\n        coverage.append(cov)\n    coverages.append(coverage)\ncoverages = np.mean(coverages, axis=0)\nplt.plot(times, coverages, label='rand')\nplt.suptitle('vgg19')\nplt.legend()\nplt.show()
henrylee97 commented 1 year ago

I think the result with colab is similar to that in the tutorial in terms of the number of generated inputs and coverage (I compared the result of Pomeranian). Detailed number could be vary, since Adapt uses some randomness in its algorithm.

Note that the time budget for VGG-19 in the paper is 1 hour for each image.

Kolt1911 commented 1 year ago

I think the result with colab is similar to that in the tutorial in terms of the number of generated inputs and coverage (I compared the result of Pomeranian). Detailed number could be vary, since Adapt uses some randomness in its algorithm.

Note that the time budget for VGG-19 in the paper is 1 hour for each image.

The reason of various number of generated cases is the randomness, but as the result shows, like the "grasshopper", the generated adversarial sample is zero. Is it normal?

henrylee97 commented 1 year ago

The images that Adapt failed in generating an adversarial input (Gordon_setter, freight_car, prayer_rug, grasshopper) are the ones that Adapt found only a few adversarial examples with 1 hour time budget. I thiink that VGG-19 is so confident about those images so that Adapt have hard time to fool the model.

Kolt1911 commented 1 year ago

Since the performance is similar between the colab and my gpu. I tried 60 min testing per seed on my gpu. But the performance is strange. Some seeds still don't generate adversarial cases, such as 'gordon_setter' and 'freight_car'. It really takes time.

I use Dlfuzz to fuzz the same model using the same seeds, including gordon_setter and freight_car, which are not generate adversarial cases. Maybe the VGG-19 is confident about these images. But some theories say the adversarial samples are dense and widespread. Does these two seeds could generate adversarial samples that I haven't find?

time=3601 vgg19 60min gpu
----------
Total inputs: 30294
  Average distance: 0.003002540674060583
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.04555643459915612
  Achieved: 0.07733386075949367
----------
Original label: Gordon_setter
  Count: 30294
  Average distance: 0.003002540674060583
----------
----------
Total inputs: 29421
  Average distance: 0.0029962738044559956
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.04555643459915612
  Achieved: 0.05748945147679325
----------
Original label: freight_car
  Count: 29421
  Average distance: 0.0029962738044559956
----------
----------
Total inputs: 29382
  Average distance: 0.0042890808545053005
Total adversarials: 5
  Average distance: 0.06863461434841156
Coverage
  Original: 0.04562236286919831
  Achieved: 0.11122099156118144
----------
Original label: matchstick
  Count: 29377
  Average distance: 0.004278129432350397
----------
Label: thatch
  Count: 5
  Average distance: 0.06863461434841156
----------
----------
Total inputs: 29365
  Average distance: 0.0074866474606096745
Total adversarials: 1505
  Average distance: 0.0637459084391594
Coverage
  Original: 0.04647943037974683
  Achieved: 0.2066191983122363
----------
Original label: house_finch
  Count: 27860
  Average distance: 0.004447516519576311
----------
Label: hen
  Count: 221
  Average distance: 0.0506969690322876
----------
Label: ruffed_grouse
  Count: 17
  Average distance: 0.05144954100251198
----------
Label: black_swan
  Count: 169
  Average distance: 0.06703733652830124
----------
Label: goose
  Count: 54
  Average distance: 0.059602245688438416
----------
Label: brain_coral
  Count: 287
  Average distance: 0.09754448384046555
----------
Label: partridge
  Count: 178
  Average distance: 0.042677417397499084
----------
Label: great_grey_owl
  Count: 92
  Average distance: 0.08370223641395569
----------
Label: starfish
  Count: 1
  Average distance: 0.09008491039276123
----------
Label: eel
  Count: 87
  Average distance: 0.03620380535721779
----------
Label: hermit_crab
  Count: 15
  Average distance: 0.055631332099437714
----------
Label: tick
  Count: 127
  Average distance: 0.06290798634290695
----------
Label: rock_python
  Count: 15
  Average distance: 0.06156248226761818
----------
Label: thunder_snake
  Count: 13
  Average distance: 0.06874621659517288
----------
Label: king_snake
  Count: 55
  Average distance: 0.07377670705318451
----------
Label: sea_slug
  Count: 140
  Average distance: 0.04529878869652748
----------
Label: coral_reef
  Count: 34
  Average distance: 0.05112788826227188
----------
----------
Total inputs: 28550
  Average distance: 0.0030834649223834276
Total adversarials: 594
  Average distance: 0.031734488904476166
Coverage
  Original: 0.045490506329113924
  Achieved: 0.10390295358649788
----------
Original label: English_foxhound
  Count: 27956
  Average distance: 0.0024746970739215612
----------
Label: standard_poodle
  Count: 512
  Average distance: 0.03266482427716255
----------
Label: English_setter
  Count: 82
  Average distance: 0.025925572961568832
----------
----------
Total inputs: 26783
  Average distance: 0.007209246978163719
Total adversarials: 400
  Average distance: 0.09940675646066666
Coverage
  Original: 0.045490506329113924
  Achieved: 0.14438291139240506
----------
Original label: Pomeranian
  Count: 26383
  Average distance: 0.005811415147036314
----------
Label: macaque
  Count: 26
  Average distance: 0.16306938230991364
----------
Label: shower_cap
  Count: 330
  Average distance: 0.08723078668117523
----------
Label: patas
  Count: 44
  Average distance: 0.15310774743556976
----------
----------
Total inputs: 23118
  Average distance: 0.0033660174813121557
Total adversarials: 314
  Average distance: 0.015147182159125805
Coverage
  Original: 0.045490506329113924
  Achieved: 0.10535337552742616
----------
Original label: suit
  Count: 22804
  Average distance: 0.0032037964556366205
----------
Label: stole
  Count: 307
  Average distance: 0.015298482961952686
----------
Label: jean
  Count: 7
  Average distance: 0.008511511608958244
----------
----------
Total inputs: 24782
  Average distance: 0.011327696032822132
Total adversarials: 1296
  Average distance: 0.1028788760304451
Coverage
  Original: 0.045490506329113924
  Achieved: 0.16126054852320676
----------
Original label: dingo
  Count: 23486
  Average distance: 0.006275736726820469
----------
Label: Ibizan_hound
  Count: 399
  Average distance: 0.04686149209737778
----------
Label: basenji
  Count: 175
  Average distance: 0.0776563361287117
----------
Label: shoe_shop
  Count: 74
  Average distance: 0.09037216752767563
----------
Label: hyena
  Count: 140
  Average distance: 0.1051555797457695
----------
Label: English_setter
  Count: 97
  Average distance: 0.11352179199457169
----------
Label: dalmatian
  Count: 2
  Average distance: 0.11158636957406998
----------
Label: golden_retriever
  Count: 51
  Average distance: 0.13159075379371643
----------
Label: puffer
  Count: 1
  Average distance: 0.12310857325792313
----------
Label: jellyfish
  Count: 95
  Average distance: 0.14943350851535797
----------
Label: pillow
  Count: 141
  Average distance: 0.1812417060136795
----------
Label: cheetah
  Count: 40
  Average distance: 0.15034721791744232
----------
Label: jackfruit
  Count: 76
  Average distance: 0.19320860505104065
----------
Label: peacock
  Count: 5
  Average distance: 0.2232123613357544
----------
----------
Total inputs: 23571
  Average distance: 0.0013311812654137611
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.04562236286919831
  Achieved: 0.053731540084388185
----------
Original label: grasshopper
  Count: 23571
  Average distance: 0.0013311812654137611
----------
----------
Total inputs: 22886
  Average distance: 0.0026822793297469616
Total adversarials: 0
  Average distance: -
Coverage
  Original: 0.04588607594936709
  Achieved: 0.08333333333333333
----------
Original label: prayer_rug
  Count: 22886
  Average distance: 0.0026822793297469616
----------
henrylee97 commented 1 year ago
The the detailed results in 2019 as follows (the table only includes the images that failed in your report): label original coverage achieved coverage total count adversarial count adversarial labels
freight_car 0.045 0.170 11860 1367 14
prayer_rug 0.045 0.220 11906 1075 8
grasshopper 0.045 0.199 11847 1830 11
Gordon_setter 0.045 0.235 11905 1751 22

Unfortunately, there is no generated image remaining.

Plus, I tried to run the VGG tutorial with my local environment (with koreaunivpl/adapt docker image), but I got similar result as you.

Kolt1911 commented 1 year ago

When it comes to the randomness, Adapt does have randomness in initializing the neuron selecting strategy and selecting strategy mutation. How many times you had done the experiment of testing vgg19 when you got this 2019 results? What if I repeat the experiment several times? Could I get the result like the 2019?

henrylee97 commented 1 year ago

The experimental results in the paper is an average of results of single run for each image. However, I did experiment multiple times, which are not included in the paper, and I could get consistent results over multiple trials at that time.

I am not sure whether you can get similar result in terms of labels found, but I can get similar result in terms of coverage.

henrylee97 commented 1 year ago

The issue is closed because there is no comments for a month. Feel free to re-open the issue if you have any further questions.