Closed wangsu502 closed 3 years ago
set the number of rounds to 10 for both xgb implementations, the results: sgx 44.7969234672623 vs normal 46.5267243112152
Hi Su,
I tried to reproduce this but I'm getting the same results with both: 46.5267243112152
Would you please double check?
I built XGBoost release_1.2.0
from source, and ran the following script:
import xgboost as xgb
import os
import csv
import random
random.seed(0)
DIR = os.path.dirname(os.path.realpath(__file__))
HOME_DIR = DIR + "/../../../"
RAW_DATA_FILE_PATH = HOME_DIR + 'demo/data/energydata_complete.csv'
TRAIN_FILE_PATH = HOME_DIR + 'demo/data/train.txt'
TEST_FILE_PATH = HOME_DIR + 'demo/data/test.txt'
# Data pre-processing
with open(RAW_DATA_FILE_PATH, 'r') as fin, open(TRAIN_FILE_PATH, 'w') as train_fout, open(TEST_FILE_PATH, 'w') as test_fout:
reader = csv.reader(fin)
_ = next(reader)
for row in reader:
label = row[1]
line = str(float(label)) + ' ' + ','.join(['{}:{}'.format(no, float(f)) for no, f in enumerate(row[2:])])
if random.random() < 0.1:
fout = test_fout
else:
fout = train_fout
fout.write(line + '\n')
dtrain = xgb.DMatrix(TRAIN_FILE_PATH)
dtest = xgb.DMatrix(TEST_FILE_PATH)
param = {'max_depth': 5, 'eta': 0.3, 'objective': 'reg:squarederror', 'n_estimators': 200, 'alpha': 0, 'lambda': 100, 'sketch_eps': 0.03}
bst = xgb.train(param, dtrain, 10)
mae, n = 0, 0
with open(TEST_FILE_PATH, 'r') as fin:
for line, y_pred in zip(fin, bst.predict(dtest)):
y = float(line.strip().split()[0])
y_pred = float(y_pred)
mae += abs(y - y_pred)
n += 1
mae = mae / n
print(mae)
And for Secure XGBoost, I built the latest code on the master
branch and ran the following:
import securexgboost as xgb
import os
import csv
import random
random.seed(0)
user_name = "user1"
DIR = os.path.dirname(os.path.realpath(__file__))
HOME_DIR = DIR + "/../../../"
RAW_DATA_FILE_PATH = HOME_DIR + 'demo/data/energydata_complete.csv'
TRAIN_FILE_PATH = HOME_DIR + 'demo/data/train.txt'
TEST_FILE_PATH = HOME_DIR + 'demo/data/test.txt'
key_file = "../../data/key_zeros.txt"
xgb.generate_client_key(key_file)
xgb.encrypt_file(TRAIN_FILE_PATH, TRAIN_FILE_PATH + ".enc", key_file)
xgb.encrypt_file(TEST_FILE_PATH, TEST_FILE_PATH + ".enc", key_file)
print("Init user and enclave parameters")
xgb.init_client(config="config.ini")
xgb.init_server(enclave_image=HOME_DIR + "build/enclave/xgboost_enclave.signed", client_list=["user1"], log_verbosity=0)
# Remote Attestation
print("Remote attestation")
# Note: Simulation mode does not support attestation
# pass in `verify=False` to attest()
xgb.attest(verify=False)
print("Creating training matrix from encrypted file")
dtrain = xgb.DMatrix({user_name: TRAIN_FILE_PATH + ".enc"})
print("Creating test matrix from encrypted file")
dtest = xgb.DMatrix({user_name: TEST_FILE_PATH + ".enc"})
param = {'max_depth': 5, 'eta': 0.3, 'objective': 'reg:squarederror', 'n_estimators': 200, 'alpha': 0, 'lambda': 100, 'sketch_eps': 0.03}
# booster = xgb.train(param, dtrain, 10)
booster = xgb.train(param, dtrain, 10, evals=[(dtrain, "train"), (dtest, "test")])
# Get encrypted predictions
print("\nModel Predictions: ")
predictions, num_preds = booster.predict(dtest, decrypt=False)
# Decrypt predictions
print(booster.decrypt_predictions(predictions, num_preds))
mae, n = 0, 0
with open(TEST_FILE_PATH, 'r') as fin:
for line, y_pred in zip(fin, booster.decrypt_predictions(predictions, num_preds)):
y = float(line.strip().split()[0])
y_pred = float(y_pred)
mae += abs(y - y_pred)
n += 1
mae = mae / n
print(mae)
In both cases, I get the same result: 46.5267243112152
Emmmm.... Did you modify the cmake options? all by defualt?
Hi Su,
I tried to reproduce this but I'm getting the same results with both:
46.5267243112152
Would you please double check?I built XGBoost
release_1.2.0
from source, and ran the following script:import xgboost as xgb import os import csv import random random.seed(0) DIR = os.path.dirname(os.path.realpath(__file__)) HOME_DIR = DIR + "/../../../" RAW_DATA_FILE_PATH = HOME_DIR + 'demo/data/energydata_complete.csv' TRAIN_FILE_PATH = HOME_DIR + 'demo/data/train.txt' TEST_FILE_PATH = HOME_DIR + 'demo/data/test.txt' # Data pre-processing with open(RAW_DATA_FILE_PATH, 'r') as fin, open(TRAIN_FILE_PATH, 'w') as train_fout, open(TEST_FILE_PATH, 'w') as test_fout: reader = csv.reader(fin) _ = next(reader) for row in reader: label = row[1] line = str(float(label)) + ' ' + ','.join(['{}:{}'.format(no, float(f)) for no, f in enumerate(row[2:])]) if random.random() < 0.1: fout = test_fout else: fout = train_fout fout.write(line + '\n') dtrain = xgb.DMatrix(TRAIN_FILE_PATH) dtest = xgb.DMatrix(TEST_FILE_PATH) param = {'max_depth': 5, 'eta': 0.3, 'objective': 'reg:squarederror', 'n_estimators': 200, 'alpha': 0, 'lambda': 100, 'sketch_eps': 0.03} bst = xgb.train(param, dtrain, 10) mae, n = 0, 0 with open(TEST_FILE_PATH, 'r') as fin: for line, y_pred in zip(fin, bst.predict(dtest)): y = float(line.strip().split()[0]) y_pred = float(y_pred) mae += abs(y - y_pred) n += 1 mae = mae / n print(mae)
And for Secure XGBoost, I built the latest code on the
master
branch and ran the following:import securexgboost as xgb import os import csv import random random.seed(0) user_name = "user1" DIR = os.path.dirname(os.path.realpath(__file__)) HOME_DIR = DIR + "/../../../" RAW_DATA_FILE_PATH = HOME_DIR + 'demo/data/energydata_complete.csv' TRAIN_FILE_PATH = HOME_DIR + 'demo/data/train.txt' TEST_FILE_PATH = HOME_DIR + 'demo/data/test.txt' key_file = "../../data/key_zeros.txt" xgb.generate_client_key(key_file) xgb.encrypt_file(TRAIN_FILE_PATH, TRAIN_FILE_PATH + ".enc", key_file) xgb.encrypt_file(TEST_FILE_PATH, TEST_FILE_PATH + ".enc", key_file) print("Init user and enclave parameters") xgb.init_client(config="config.ini") xgb.init_server(enclave_image=HOME_DIR + "build/enclave/xgboost_enclave.signed", client_list=["user1"], log_verbosity=0) # Remote Attestation print("Remote attestation") # Note: Simulation mode does not support attestation # pass in `verify=False` to attest() xgb.attest(verify=False) print("Creating training matrix from encrypted file") dtrain = xgb.DMatrix({user_name: TRAIN_FILE_PATH + ".enc"}) print("Creating test matrix from encrypted file") dtest = xgb.DMatrix({user_name: TEST_FILE_PATH + ".enc"}) param = {'max_depth': 5, 'eta': 0.3, 'objective': 'reg:squarederror', 'n_estimators': 200, 'alpha': 0, 'lambda': 100, 'sketch_eps': 0.03} # booster = xgb.train(param, dtrain, 10) booster = xgb.train(param, dtrain, 10, evals=[(dtrain, "train"), (dtest, "test")]) # Get encrypted predictions print("\nModel Predictions: ") predictions, num_preds = booster.predict(dtest, decrypt=False) # Decrypt predictions print(booster.decrypt_predictions(predictions, num_preds)) mae, n = 0, 0 with open(TEST_FILE_PATH, 'r') as fin: for line, y_pred in zip(fin, booster.decrypt_predictions(predictions, num_preds)): y = float(line.strip().split()[0]) y_pred = float(y_pred) mae += abs(y - y_pred) n += 1 mae = mae / n print(mae)
In both cases, I get the same result:
46.5267243112152
Could you please provide your running environment information? python version, numpy version and etc.
Emmmm.... Did you modify the cmake options? all by defualt?
Default options except that I ran it in simulation mode (OE_DEBUG=1 and SIMULATE=ON), but I don't believe that should matter. Did you build regular XGBoost from source? If not, can you try that please? https://github.com/dmlc/xgboost/tree/v1.2.0
Emmmm.... Did you modify the cmake options? all by defualt?
Default options except that I ran it in simulation mode (OE_DEBUG=1 and SIMULATE=ON), but I don't believe that should matter. Did you build regular XGBoost from source? If not, can you try that please? https://github.com/dmlc/xgboost/tree/v1.2.0
hi, I've fixed the issue by rebuilding everything. XD
Great, closing this issue in that case!
Hi, I found the prediction results from the latest secure xgboost are always different from xgboost 1.2.0. Dataset: https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv
My python code: `import csv import xgboost as xgb
import random
random.seed(0)
RAW_DATA_FILE_PATH = '/home/test/energydata_complete.csv' TRAIN_FILE_PATH = '/home/test/regression/train.txt' TEST_FILE_PATH = '/home/test/regression/test.txt'
def main():
Data pre-processing
if name == 'main': main()`
The result is [05:31:05] WARNING: xgboost/src/learner.cc:516: Parameters: { n_estimators } might not be used.
This may not be accurate due to some parameters are only used in language bindings but passed down to XGBoost core. Or some parameters are not used but slip through this verification. Please open an issue if you find above cases.
46.5267243112152
And below is my python code for secure xgboost: ` xgb.generate_client_key(key_file) xgb.encrypt_file(inputfile_train, inputfile_train + ".enc", key_file) xgb.encrypt_file(inputfile_test, inputfile_test + ".enc", key_file)
And the result is: Beginning Training Set training parameters: {'max_depth': 5, 'eta': 0.3, 'objective': 'reg:squarederror', 'n_estimators': 200, 'alpha': 0, 'lambda': 100, 'sketch_eps': 0.03}
Model Predictions: [ 74.07482 47.515785 47.590363 ... 117.18135 157.50322 114.312454] 25.92078459969717
The MAE of secure xgboost is litter than the normal xgboost. Is there any optimization applied to the implementation? Could you help to look at this issue? I think the results should be the same when xgb got the same parameters.
thanks, Su