mlcommons / inference_results_v4.0

This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.
https://mlcommons.org/benchmarks/inference-datacenter/
Apache License 2.0
10 stars 17 forks source link

Segmentation Fault (core dumped) : for both baremetal and docker setup in RESNET50 #6

Open PriyaBSavithiri opened 5 months ago

PriyaBSavithiri commented 5 months ago

Hi,

I am trying to run RESNET50 code by official command without any modification but facing segmentation fault (core dumped) issue when running in both Baremetal and Docker.

COMMAND: bash run_offline.sh 1

LOG: Docker:

[root@a29df81b85ee pytorch-cpu]# bash run_offline.sh 1 user_default.conf section default default resnet50..performance_sample_count_override = 1024 custom resnet50..performance_sample_count_override = 1024 default .Offline.target_qps = 25150 custom .Offline.target_qps = 37725.0 default .Server.target_qps = 19810 custom .Server.target_qps = 29715.0 default .Server.min_duration = 600000 custom .Server.min_duration = 600000 [SUT] Creating instance 0 run_offline.sh: line 85: 2859 Segmentation fault (core dumped) $numactl ${APP} --scenario Offline --mode Performance --mlperf_conf ${CUR_DIR}/src/mlperf.conf --user_conf ${USER_CONF} --model_name resnet50 --rn50-part1 ${RN50_START} --rn50-part3 ${RN50_END} --rn50-full-model ${RN50_FULL} --data_path ${DATA_DIR} --num_instance $number_cores --warmup_iters 20 --cpus_per_instance $CPUS_PER_INSTANCE --total_sample_count 50000 --batch_size $1

Baremetal:

(rn50-mlperf)/home/user:~/inference_results_v4.0/closed/Intel/code/resnet50/pytorch-cpu$ bash run_offline.sh 1 user_default.conf section default default resnet50..performance_sample_count_override = 1024 custom resnet50..performance_sample_count_override = 1024 default .Offline.target_qps = 25150 custom .Offline.target_qps = 37725.0 default .Server.target_qps = 19810 custom .Server.target_qps = 29715.0 default .Server.min_duration = 600000 custom .Server.min_duration = 600000 [SUT] Creating instance 0 run_offline.sh: line 85: 3835024 Segmentation fault (core dumped) $numactl ${APP} --scenario Offline --mode Performance --mlperf_conf ${CUR_DIR}/src/mlperf.conf --user_conf ${USER_CONF} --model_name resnet50 --rn50-part1 ${RN50_START} --rn50-part3 ${RN50_END} --rn50-full-model ${RN50_FULL} --data_path ${DATA_DIR} --num_instance $number_cores --warmup_iters 20 --cpus_per_instance $CPUS_PER_INSTANCE --total_sample_count 50000 --batch_size $1

Anyone facing the same problem?

Thanks in advance.