Closed Yash-Vekaria closed 3 years ago
This is because the input file of domain_list
does not support a URL format (https://foo.bar.com/") but only an FQDN format (foo.bar.com) of the domain. Removing schema or path in the domain list would show the correct cohort ID.
@shigeki If you observe the above two screenshots (in my original comment), I have done the same.
While running your simulator, in the input file of domain_list I have entered FQDN format of the websites. They are follows: ["yahoo.com", "bbc.com", "cnn.com", "youtube.com", "foxnews.com", "techcrunch.com", "msn.com"] For this input I have received the cohort ID as "24904" by the simulator.
While URL Format of these same websites when checked for FLoC ID in google's implementation, it returns "16619" as cohort ID. So, there is some issue with the simulator.
Note: While actually visiting websites to verify FLoC ID with google's implementation, we need to visit their URL Format; FQDN Format doesn't work.
I checked it and I found a bug of CityHashv103 and fixed in 292fe1beb8a827f27a96dc951345a5c1cfb8da56. Thanks. But its cohort Id is 14933, not 16619.
$ ./floc_sample test_domain.json
domain_list: [yahoo.com bbc.com cnn.com techcrunch.com youtube.com foxnews.com msn.com]
sim_hash: 568239095611142
cohortId: 14933
It is the same value as my Chromium below, where annotation flags in its history are removed and the update interval is changed to 5min for this test output. Please check your history is the same one.
@shigeki I set the update interval to 300s (i.e. 5min.) and minimum_history_domain_size_required to 1 and finch_config_version (i.e. floc version) to 1 (i.e. it will consider Chrome 1.1 floc). If you see the history and the FLoC ID that Google returns, it is "16619".
The exact flags set are as follows:
--enable-blink-features=InterestCohortAPI
--enable-features=FederatedLearningOfCohorts:update_interval/300s/minimum_history_domain_size_required/1/finch_config_version/1,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy
Additionally, as described on the main page of simulator, the simulator is said to be tested and considered working for FLoC version Chrome 2.1. The SS that you shared is for Chrome 1.1. For the above history, Chrome 2.1 floc version is also returning "16619". It would be very helpful if you can fix the simulator to return "16619".
Try to add a feature option of FlocPagesWithAdResourcesDefaultIncludedInFlocComputation
. Clear history and confirm exception error output to confirm the cohort ID is not fetched from the previous preference log.
Wait for more than 300 sec, then you can get a new cohort ID, which is 14933 for my case. I believe that it is the right answer.
The whole option is
--enable-blink-features=InterestCohortAPI --enable-features=FederatedLearningOfCohorts:update_interval/300s/minimum_history_domain_size_required/1/finch_config_version/1,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy,FlocPagesWithAdResourcesDefaultIncludedInFlocComputation
@shigeki I visited the websites under discussion and then fetched cohort ID using API, it returned "17745". Then as directed by you, I waited for 300+ seconds and recalled the API and this time got Cohort ID as "14925" and not "14933" (as you got). Also the simulator should return "17745", right? Why does it return the FLoC ID that is refreshed post 300+secs?
Note: This time I tried a slight different combination as mentioned here for the websites visited: ["https://www.yahoo.com/", "https://www.bbc.com/", "https://www.edition.cnn.com/", "https://www.youtube.com/", "https://www.foxnews.com/", "https://www.techcrunch.com/", "https://www.msn.com/"]
Following is the simulator output, if that helps:
17745 is the wrong Cohort ID for the list of domains, which was calculated and saved in your preference in the past.
You should clear your all history or settings in order to clear it, then you have an exception from document.interestCohort()
.
I'm not sure why your chrome returns 14925, not 14933.
Looking at the screenshot, the differences are a data scheme at the first in your history and a language setting. Both do not affect FLoC.
I think that it is a new issue on Chrome, not floc_simulator and I cannot solve it because it cannot be reproducible in my Chrome. I believe that 14933 is the right cohort ID because both my Chrome and floc_simulator return the same number.
@shigeki - I am using Chrome version: Version 92.0.4515.159 (Official Build) (x86_64) and get the same result as @Yash-Vekaria
Any further ideas on what might be causing this?
We have a different cohort id that comes from 6 URLs. Comparing cohort id with only one URL would give us some hints. The cohort ids of each one url by floc_simulator is followings.
diff --git a/packages/floc/setup.go b/packages/floc/setup.go
index b60470d..59eab31 100644
--- a/packages/floc/setup.go
+++ b/packages/floc/setup.go
@@ -8,7 +8,7 @@ import (
"os"
)
-var kFlocIdMinimumHistoryDomainSizeRequired int = 7
+var kFlocIdMinimumHistoryDomainSizeRequired int = 1
// cluster data comes from ~/Library/Application\ Support/Google/Chrome\ Canary/Floc/1.0.6/ in MacOS
var cluster_file = "../../Floc/1.0.6/SortingLshClusters"
$ ./floc_sample yahoo_com.json
domain_list: [yahoo.com]
sim_hash: 340880272222230
cohortId: 8802
$ ./floc_sample bbc_com.json
domain_list: [bbc.com]
sim_hash: 525203699512276
cohortId: 13856
$ ./floc_sample cnn_com.json
domain_list: [cnn.com]
sim_hash: 826482894952494
cohortId: 23217
$ ./floc_sample youtube_com.json
domain_list: [youtube.com]
sim_hash: 986137333389667
cohortId: 28033
$ ./floc_sample techcrunch_com.json
domain_list: [techcrunch.com]
sim_hash: 760961668495438
cohortId: 20842
$ ./floc_sample msn_com.json
domain_list: [msn.com]
sim_hash: 6139457615798
cohortId: 158
Try to run Chrome with the following options with update_interval: 10 sec and minimum history domain size required: 1. I'm using Chromium 95.0.4620.0 (Developer Build) (x86_64). I think Chrome Canary would show the same results.
--enable-blink-features=InterestCohortAPI --enable-features="FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1,InterestCohortFeaturePolicy,FlocPagesWithAdResourcesDefaultIncludedInFlocComputation"
My chrome shows the same cohort ids as those with floc_simulator as below.
Check your output of each URL. Do not forget to clear your browser history in your Chrome before accessing the URL and wait for more than 10 seconds before to show the cohort id.
@shigeki - I was able to reproduce the exact same IDs as you shared. I am also able to get cohort ID: 14933. The difference seems to be with the flags used to start up Chrome.
Using: --enable-blink-features=InterestCohortAPI --enable-features="FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1,InterestCohortFeaturePolicy,FlocPagesWithAdResourcesDefaultIncludedInFlocComputation"
Results in: 14933
Using (suggested by Google): --enable-blink-features=InterestCohortAPI --enable-features="FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy"
Results in: 16619
Thank you for diving in, I am happy that your simulator now matches my Chrome!
It is great.
The flag of FlocPagesWithAdResourcesDefaultIncludedInFlocComputation
is needed after OT(ver:chrome/2.1) is finished in order to apply visited pages with ad resources in your history to FLoC computation.
Close this.
While experimenting with the floc_simulator, I observed that the simulator is dependent on the input characters like "www." or even "/"; whereas the actual floc by google is independent of these variations and always considers eTLD+1.
For example: for websites "https://www.yahoo.com", "https://www.yahoo.com/", "https://yahoo.com/" and "https://yahoo.com" google's FLoC Algorithm returns the same FLoC ID when any variation mentioned above is checked for FLoC ID. This is shown by the output screenshot below:
However, when a set of the same websites are checked with the above variations in Shigeki's floc_simulator, it returns a different FLoC ID for each change in variation. The change in FLoC ID with these variations is shown by the following output screenshot for floc_simulator.
The FLoC ID is not matching even for the sample inputs that the simulator has shown on its homepage.
Note: Since Shigeki's floc_simulator works only for FLoC Version 2.1. This version has been explicitly set using the browser flag snippet below (i.e., setting finch_config_version) and used while carrying out all of the above-mentioned experiments/tests. Also, all the websites used in experimentation have opted-in for FLoC OT.
FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1/finch_config_version/2,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy