Open dc0953 opened 3 years ago
python dlrm_s_pytorch.py --arch-sparse-feature-size=128 --arch-mlp-bot="13-512-256-128" \
--arch-mlp-top="1024-1024-512-256-1" --max-ind-range=40000000 --data-generation=dataset \
--data-set=terabyte --raw-data-file=/data/day --processed-data-file=/data/day --loss-function=bce \
--round-targets=True --learning-rate=1.0 --mini-batch-size=2048 --print-freq=2048 --print-time \
--test-freq=102400 --test-mini-batch-size=16384 --test-num-workers=16 --memory-map --mlperf-logging \
--mlperf-auc-threshold=0.8025 --mlperf-bin-loader --mlperf-bin-shuffle \
--use-gpu
에러 코드
"ERROR: Criteo Terabyte Dataset path is invalid; please download from https://labs.criteo.com/2013/12/download-terabyte-click-logs")
확장자가 .gz 으로 되어야 하나, 압축 해제 한 파일로 전처리 진행 시 발생되는 에러
# WARNING: The raw data consist of day_0.gz,... ,day_23.gz text files
# Each line in the file is a sample, consisting of 13 continuous and
# 26 categorical features (an extra space indicates that feature is
# missing and will be interpreted as 0).
for i in range(days):
datafile_i = datafile + "_" + str(i) # + ".gz"
if path.exists(str(datafile_i)):
print("Reading data from path=%s" % (str(datafile_i)))
# file day_<number>
total_per_file_count = 0
with open(str(datafile_i)) as f:
for _ in f:
total_per_file_count += 1
total_per_file.append(total_per_file_count)
total_count += total_per_file_count
else:
sys.exit("ERROR: Criteo Terabyte Dataset path is invalid; please download from https://labs.criteo.com/2013/12/download-terabyte-click-logs")
전처리 진행중
1.2 Clone the reference implementation repository.
1.3 Build and run the reference docker image.
1.4 Run the training script to obtain the preprocessed data. This process can take up to several days and needs a few TB of fast storage space. As a result, files named: "day_train.bin" and "day_test.bin" will be created.
After creating the preprocessed dataset the script will start training using the reference implementation. This will be clearly visible in the logs e.g., by the script printing: "Finished training it" etc. This can be safely interrupted with "Ctrl+C" as we only need this script to produce the preprocessed data and not to complete the full training run.