yohhaan / topics_api_analysis

This is the code artifact of the paper "A Public and Reproducible Assessment of the Topics API on Real Data"
https://arxiv.org/abs/2403.19577
GNU General Public License v3.0
2 stars 1 forks source link

topics_api_analysis

This is the code artifact of the paper A Public and Reproducible Assessment of the Topics API on Real Data

@inproceedings{topics_secweb24_beugin,
      title={A Public and Reproducible Assessment of the Topics API on Real Data},
      author={Yohan Beugin and Patrick McDaniel},
      booktitle={2024 IEEE Security and Privacy Workshops (SPW)},
      year={2024},
      month={may},
}

Check out also our other topics_analysis repository.


Getting Started

  1. Clone this topics_api_analysis repository and the topics_classifier submodule at once with:
    • git clone --recurse-submodules git@github.com:yohhaan/topics_api_analysis.git (SSH)
    • git clone --recurse-submodules https://github.com/yohhaan/topics_api_analysis.git (HTTPS)

A Dockerfile is provided under .devcontainer/; for direct integration with VS Code or to manually build the image and deploy the Docker container, follow the instructions in this guide.

Reproduction Steps

Topics classification: refer to and execute the bash scripts in the corresponding folder under ./data to classify the different datasets with the Topics API:

Topics evaluation: refer to the topics_simulator.py script to evaluate the Topics API (simulation of the API for users, denoising, and re-identification across epochs)

usage: python3 topics_simulator.py [-h]
                                   users_topics_tsv nb_epochs config_model_json top_list_tsv
                                   unobserved_topics_threshold repeat_each_user_n_times output_prefix

Simulate the Topics API and evaluate its privacy guarantees

positional arguments:
  users_topics_tsv
  nb_epochs
  config_model_json
  top_list_tsv
  unobserved_topics_threshold
  repeat_each_user_n_times
  output_prefix

Examples:

Analysis: to extract statistics and plot the figures, refer to the analysis.py script.