Which different privacy settings do we want to analyze?

n-aggarwal commented 1 year ago

We we collect the Network data for the App, what different privacy settings do we want to change and study the effects for? As of right now the following 5 different scenarios come to my mind:

AdId
AdId + GPC
Delete AdID
Delete AdID + Force Quit
Delete AdID + GPC

If anyone any any thoughts on this feel free to comment!

n-aggarwal commented 1 year ago

I have uploaded the mitm-captures for The Weather Channel App in branch issue-73. I have included 3 different types of captures:

Sleep: The script sleeps for 30 seconds while the app runs. The AdId for this run is 26e354ef-640e-4331-86e2-1210ee32943a
```
sleep 30
```

Monkey: The script runs the Monkey exerciser while the app runs. The AdId for this run is 9409c9d0-51ce-4425-a985-1e7a0197949b

sleep 10
adb shell monkey -p $TARGET_PACKAGE_NAME --throttle 30 --pct-syskeys 0 --pct-touch 80 --pct-appswitch 20 -v 100
sleep 2
adb shell monkey -p $TARGET_PACKAGE_NAME --throttle 30 --pct-syskeys 0 --pct-touch 80 --pct-appswitch 20 -v 100
sleep 2
adb shell monkey -p $TARGET_PACKAGE_NAME --throttle 30 --pct-syskeys 0 --pct-touch 80 --pct-appswitch 20 -v 100
sleep 2
adb shell monkey -p $TARGET_PACKAGE_NAME --throttle 30 --pct-syskeys 0 --pct-touch 80 --pct-appswitch 20 -v 100
sleep 2
adb shell monkey -p $TARGET_PACKAGE_NAME --throttle 30 --pct-syskeys 0 --pct-touch 80 --pct-appswitch 20 -v 100
sleep 2

Manual: The script again sleeps for 30 seconds, but this time I manually interact with the app. The AdId for this run is 13ad3bab-88bf-47e9-81da-c41caf2c3018
```
sleep 30
```
The AdId's given above can also be found in a file named APP_ADID.txt in the scripts folder.

Each of the 3 types listed above was tested for 6 different settings:

AdId
AdId + Permissions (Permissions are granted in all future captures)
AdId + GPC
No Adid
No AdId + Force Quit
No AdId + GPC

This follows the outline of the single-app-automation-script.sh.

I have included the three different types of captures to see whether there is a significant difference in the network captured by different approaches. From a high level, the manual capture is (unsurprisingly) the best; the capture size for this approach is about 10 times bigger. Nevertheless, the real question is whether this makes any difference in the analysis or the simple approaches are good enough.

The problem with Monkey is that it gets stuck on the "Agree to terms and conditions" checkbox or similar stuff where there is only one small button. The monkey sends random touches and swipes which don't make a difference in this case until it gets lucks and actually taps on the checkbox. This is exactly what happened for the weather app.

kasnder commented 1 year ago

It's interesting to see that some of the captures have very different sizes.

I would probably opt for the "Sleep" variant in later analyses. Otherwise, it's super difficult to compare the different network captures.

SebastianZimmeck commented 1 year ago

I would probably opt for the "Sleep" variant in later analyses.

That seems to me a good choice as well. The Sleep captures seem pretty similar to the Monkey captures (so we may not need an Exerciser Monkey after all).

The Manual captures look much more extensive. So, there seems to be a substantial difference to Sleep and Monkey. So, we mainly have to make a call whether we want to go deep or broad.

I have two questions:

1. What are we expecting to be the outcome of our analysis?

Basically, no app is respecting GPC but they are respecting opting out via the AdID (i.e., they are zeroing out the AdID)?
Apps are circumventing opting out via the AdID by using alternative identifiers?
Apps are collecting data they should not collect when people opted out from tracking via the AdID?

Obviously, we do not know yet because we have not done the analysis. However, if we can put our finger on it, it would be good if we had a hypothesis and/or motivation. This is related to the story of our paper. That said, have you seen anything interesting @n-aggarwal and @wesley-tan?

2. Some of the conditions are unclear to me. Could you clarify @wesley-tan or @n-aggarwal?

I get:

AdId
AdId + GPC
No Adid
No AdId + GPC

I do not get:

2 AdId + Permissions (Permissions are granted in all future captures) 5 No AdId + Force Quit

Wouldn't we need to turn on permissions in the other conditions Adid + GPC, No Adid, and No Adid + GPC as well if we think that would make a difference?

On No AdId + Force Quit, what is the reason for the Force Quit?

n-aggarwal commented 1 year ago

Wouldn't we need to turn on permissions in the other conditions Adid + GPC, No Adid, and No Adid + GPC as well if we think that would make a difference?

Yes! That's exactly what's happening here. I turn on the Permissions in the second capture of the app, and don't revoke them for the rest of the testing. It's only the first capture where I am not granting permissions to apps to see what an app would do under that scenario.

On No AdId + Force Quit, what is the reason for the Force Quit?

The reason for force quit is that some apps may be caching the AdId for faster performance. Although this is against the google developer policy, it's somewhat better than outright saving it. So it may be important to make this distinction. This can be seen in the initial exploration I did earlier in issue #56.

SebastianZimmeck commented 1 year ago

I turn on the Permissions in the second capture of the app, and don't revoke them for the rest of the testing.

OK, that is good.

The reason for force quit is that some apps may be caching the AdId for faster performance.

Good point! It will be interesting to see whether in 4. No AdId + Permissions we still see AdIDs.

I wonder, should it be like this?

AdId
AdId + Permissions
AdId + Permissions + GPC
No AdId + Permissions + Force Quit
No AdId + Permissions
No AdId + Permissions + GPC

In other words, should the Force Quit be at the end of the fourth condition instead of at the end of the fifth condition? Wouldn't otherwise the fourth and fifth condition be identical (up until the end of the Force Quit at the end of the fifth condition)?

SebastianZimmeck commented 1 year ago

How does the Facebook SDK interact with the device and other apps. Maybe, also interesting too look at. @kasnder provides info here. This is more about the data analysis and not about the collection (like the rest of this issue).

SebastianZimmeck commented 1 year ago

We are settled on the privacy settings. So, feel free to close @n-aggarwal.

privacy-tech-lab / gpc-android

Which different privacy settings do we want to analyze? #73