secretflow / psi

The repo of Private Set Intersection(PSI) and Private Information Retrieval(PIR) from SecretFlow.
https://www.secretflow.org.cn/docs/psi
Apache License 2.0
27 stars 25 forks source link

[Bug]: Sender error with tens of millions of data entries:Get data timeout, key=root:P2P-1:1->0 #200

Open MLikeWater opened 2 weeks ago

MLikeWater commented 2 weeks ago

Describe the bug

Sender Setup Stage

./main --config config/apsi_sender_setup_bucket.json

sender terminal

./main --config config/apsi_sender_online_bucket.json

log:

[2024-11-08 01:42:06.058] [info] [main.cc:44] SecretFlow PSI Library v0.5.0.dev241016 Copyright 2023 Ant Group Co., Ltd.
I1108 01:42:06.075622 2058268     0 external/com_github_brpc_brpc/src/brpc/server.cpp:1204] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=5300.
W1108 01:42:06.075647 2058268     0 external/com_github_brpc_brpc/src/brpc/server.cpp:1210] Builtin services are disabled according to ServerOptions.has_builtin_services
[2024-11-08 01:42:06.075] [info] [entry.cc:455] Setting thread count to 152
INFO  01:42:06:077.216: ::apsi::PSIParams have false-positive probability 2^(-53.0384) per receiver item
[2024-11-08 01:42:06.077] [info] [group_db.cc:234] DB file /home/admin/dev/demo/data/apsi_sender_bucket//0_group.db already exists, load_meta /home/admin/dev/demo/data/apsi_sender_bucket//0_group.db.meta directly
DEBUG 01:42:06:077.517: Start loading SenderDB
DEBUG 01:42:06:078.270: Loaded SenderDB properties: item_count: 4977; label_byte_count: 1; nonce_byte_count: 16; compressed: false; stripped: true
DEBUG 01:42:06:085.592: Loaded BinBundle at bundle index 0 (511304 bytes)
DEBUG 01:42:06:086.217: Loaded BinBundle at bundle index 0 (956480 bytes)
DEBUG 01:42:06:086.277: Loaded SenderDB with 4977 items (1468248 bytes)
INFO  01:42:06:086.291: Start generating bin bundle caches
INFO  01:42:06:086.298: Finished generating bin bundle caches
DEBUG 01:42:06:086.308: Finished loading SenderDB
INFO  01:42:06:108.252: Loaded SenderDB (1468248 bytes)
INFO  01:42:06:108.354: Loaded OPRF key (32 bytes)
I1108 01:42:06.209764 2058271 4295006720 external/com_github_brpc_brpc/src/brpc/socket.cpp:2566] Checking Socket{id=0 addr=127.0.0.1:5400} (0x55a1daa1bc80)

receiver terminal

./main --config config/apsi_receiver_bucket.json

log:

[2024-11-08 01:42:17.487] [info] [main.cc:44] SecretFlow PSI Library v0.5.0.dev241016 Copyright 2023 Ant Group Co., Ltd.
I1108 01:42:17.504922 2058577     0 external/com_github_brpc_brpc/src/brpc/server.cpp:1204] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=5400.
W1108 01:42:17.504944 2058577     0 external/com_github_brpc_brpc/src/brpc/server.cpp:1210] Builtin services are disabled according to ServerOptions.has_builtin_services
INFO  01:42:20:129.698: ::apsi::PSIParams have false-positive probability 2^(-53.0384) per receiver item
[2024-11-08 01:42:20.129] [info] [entry.cc:162] Setting thread count to 152
DEBUG 01:42:20:129.873: PSI parameters set to: item_params.felts_per_item: 5; table_params.table_size: 409; table_params.max_items_per_bin: 42; table_params.hash_func_count: 1; query_params.ps_low_degree: 0; query_params.query_powers: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42}; seal_params.poly_modulus_degree: 2048; seal_params.coeff_modulus: [48]; seal_params.plain_modulus: 65537
DEBUG 01:42:20:129.899: Derived parameters: item_bit_count_per_felt: 16; item_bit_count: 80; bins_per_bundle: 2045; bundle_idx_count: 1
DEBUG 01:42:20:131.354: Configured PowersDag with depth 0
[2024-11-08 01:42:20.132] [info] [csv_reader.cc:73] read file /home/admin/dev/demo/data/meituan_data_2500w.csv with header key, column_names: key
I1108 01:42:50.389800 2058649 4295006720 external/com_github_brpc_brpc/src/brpc/socket.cpp:2566] Checking Socket{id=0 addr=127.0.0.1:5300} (0x55dab4b99c40)
[2024-11-08 01:42:52.040] [info] [csv_reader.cc:162] Read csv file /home/admin/dev/demo/data/meituan_data_2500w.csv, row cnt is 25000000
[2024-11-08 01:43:04.444] [info] [entry.cc:205] Start deal with bucket 5068
[2024-11-08 01:43:04.444] [info] [entry.cc:210] Sending OPRF request for 2430  items
INFO  01:43:04:551.902: Created OPRFReceiver for 2430 items
INFO  01:43:04:551.963: Created OPRF request for 2430 items
DEBUG 01:43:04:551.968: Sending operation of type sop_oprf
[2024-11-08 01:43:04.582] [info] [channel.cc:362] send request failed and retry, retry_count=1, max_retry=3, interval_ms=1000, message=[external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
[2024-11-08 01:43:05.583] [info] [channel.cc:362] send request failed and retry, retry_count=2, max_retry=3, interval_ms=3000, message=[external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
[2024-11-08 01:43:08.583] [info] [channel.cc:362] send request failed and retry, retry_count=3, max_retry=3, interval_ms=5000, message=[external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'
[2024-11-08 01:43:13.583] [error] [channel.cc:104] SendImpl error [external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 127.0.0.1:5300 yet, server_id=0'

result

Once the receiver starts, the sender fails, and the log is as follows:

I1108 01:42:18.211352 2058313 4295006724 external/com_github_brpc_brpc/src/brpc/socket.cpp:2626] Revived Socket{id=0 addr=127.0.0.1:5400} (0x55a1daa1bc80) (Connectable)
terminate called after throwing an instance of 'yacl::IoError'
  what():  [external/yacl/yacl/link/transport/channel.cc:430] Get data timeout, key=root:P2P-1:1->0
Aborted (core dumped)

Steps To Reproduce

config/apsi_sender_setup_bucket.json

{ "apsi_sender_config": { "threads": 1, "log_level": "info", "source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv", "params_file": "/home/admin/dev/demo/data/100K-1-16.json", "save_db_only": true, "experimental_enable_bucketize": true, "experimental_bucket_cnt": 10000, "experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/", "experimental_db_generating_process_num": 16, "experimental_bucket_group_cnt": 512 } }

config/apsi_sender_online_bucket.json

{ "apsi_sender_config": { "source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv", "params_file": "/home/admin/dev/demo/data/100K-1-16.json", "experimental_enable_bucketize": true, "experimental_bucket_cnt": 10000, "experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/", "experimental_db_generating_process_num": 16, "experimental_bucket_group_cnt": 512 }, "link_config": { "parties": [ { "id": "sender", "host": "127.0.0.1:5300" }, { "id": "receiver", "host": "127.0.0.1:5400" } ] }, "self_link_party": "sender" }

config/apsi_receiver_bucket.json

{ "apsi_receiver_config": { "query_file": "/home/admin/dev/demo/data/meituan_data_2500w.csv", "output_file": "/home/admin/dev/demo/data/batch_result.csv", "params_file": "/home/admin/dev/demo/data/100K-1-16.json", "experimental_enable_bucketize": true, "experimental_bucket_cnt": 10000 }, "link_config": { "parties": [ { "id": "sender", "host": "127.0.0.1:5300" }, { "id": "receiver", "host": "127.0.0.1:5400" } ] }, "self_link_party": "receiver" }

Expected behavior

The sender has a data volume of 50 million, consisting of keys and values, where the key is a hash value of a phone number and starts with any letter from A to K. The receiver has a data volume of 25 million, containing only keys. The expected result is that the receiver obtains the intersection of 25 million keys along with the corresponding values.

Version

v0.4.2b0

Operating system

Ubuntu 20.04

Hardware Resources

48C96G

huocun-ant commented 2 weeks ago

The current engineering implementation of APSI is not perfect and can only meet the performance of a few queries for algorithm testing. We will define PIR-related interfaces and related optimizations in the future.

tongke6 commented 2 weeks ago

@MLikeWater

You can try adding the parameter recv_timeout_ms in the link_config and increasing its value. Reference: https://github.com/secretflow/psi/blob/c2f460e20efbe74c3a80b26c98e8bb89295d717f/docs/reference/launch_config.md?plain=1#L85

MLikeWater commented 2 weeks ago

The current engineering implementation of APSI is not perfect and can only meet the performance of a few queries for algorithm testing. We will define PIR-related interfaces and related optimizations in the future.

Received, thank you for the information provided. In the testing demo phase, for handling large data volumes, besides the recv_timeout_ms parameter suggested by @tongke6, what other configurations are needed?

Looking forward to your response, thanks.

MLikeWater commented 2 weeks ago

@huocun-ant Using the parameters from https://github.com/secretflow/psi/blob/main/examples/pir/apsi/parameters/256M-4096.json for validation, it is currently possible to run with tens of millions of data entries, although the speed is relatively slow. The log will output the process.

image

There is another issue, the sender's original data volume is only 3.2G, but during the sender setup phase, the generated data directory is quite large, reaching 153G(apsi_sender_bucket directory).

{
    "table_params": {
        "hash_func_count": 3,
        "table_size": 6144,
        "max_items_per_bin": 4000
    },
    "item_params": {
        "felts_per_item": 4
    },
    "query_params": {
        "ps_low_degree": 310,
        "query_powers": [ 1, 4, 10, 11, 28, 33, 78, 118, 143, 311, 1555]
    },
    "seal_params": {
        "plain_modulus_bits": 26,
        "poly_modulus_degree": 8192,
        "coeff_modulus_bits": [ 50, 50, 50, 38, 30 ]
    }
}
huocun-ant commented 2 weeks ago
  1. add compress option
    • config/apsi_sender_setup_bucket.json
      {
      "apsi_sender_config": {
      "threads": 1,
      "log_level": "info",
      "compress": true,
      "source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv",
      "params_file": "/home/admin/dev/demo/data/100K-1-16.json",
      "save_db_only": true,
      "experimental_enable_bucketize": true,
      "experimental_bucket_cnt": 10000,
      "experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/",
      "experimental_db_generating_process_num": 16,
      "experimental_bucket_group_cnt": 512
      }
      }
    • config/apsi_sender_online_bucket.json
      {
      "apsi_sender_config": {
      "source_file": "/home/admin/dev/demo/data/bank_data_5000w.csv",
      "params_file": "/home/admin/dev/demo/data/100K-1-16.json",
      "experimental_enable_bucketize": true,
      "compress": true, 
      "experimental_bucket_cnt": 10000,
      "experimental_bucket_folder": "/home/admin/dev/demo/data/apsi_sender_bucket/",
      "experimental_db_generating_process_num": 16,
      "experimental_bucket_group_cnt": 512
      },
      "link_config": {
      "parties": [
      {
      "id": "sender",
      "host": "127.0.0.1:5300"
      },
      {
      "id": "receiver",
      "host": "127.0.0.1:5400"
      }
      ]
      },
      "self_link_party": "sender"
      }
  2. You can reduce bucket num, but this may increase query time. "params_file": "/home/admin/dev/demo/data/100K-1-16.json" means your bucket size is 10k, and your query size is 1 row. So experimental_bucket_cnt should be 5000w / 100k = 500, you can set experimental_bucket_cnt to 500. There is a issue, your query size is large, but 100K-1-16.json is optimized for 1 row query, therefore, the parameters contain optimization space.