mlcommons / training_results_v1.0

This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.
https://mlcommons.org/en/training-normal-10/
Apache License 2.0
37 stars 43 forks source link

Issue with DGX config file (what will be the required changes in config file) #6

Open Suraj6198 opened 3 years ago

Suraj6198 commented 3 years ago

We are trying to run RNNT benchmark on our DGX station(https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/dgx-station/368040-DGX-Station-DS-R11.pdf). Please help to set the right config parameters. Here are our logs after executing "CONT=mlperf-nvidia:rnn_speech_recognition-pytorch DATADIR=<path/to/data/dir> LOGDIR=<path/to/output/dir> METADATA_DIR=<path/to/metadata/dir> SENTENCEPIECES_DIR=<path/to/sentencepieces/dir> bash ./run_with_docker.sh" command:

<0-5,48-53> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D

memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l

is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 54-59 is out of range <6-11,54-59> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 60-65 is out of range <12-17,60-65> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 66-71 is out of range <18-23,66-71> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 72-77 is out of range <24-29,72-77> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 78-83 is out of range <30-35,78-83> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 41,84-89 out of range <36-41,84-89> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: cpu argument 42-47,90-95 is out of range <42-47,90-95> is invalid usage: numactl [--all | -a] [--interleave= | -i ] [--preferred= | -p ] [--physcpubind= | -C ] [--cpunodebind= | -N ] [--membind= | -m ] [--localalloc | -l] command args ... numactl [--show | -s] numactl [--hardware | -H] numactl [--length | -l ] [--offset | -o ] [--shmmode | -M ] [--strict | -t] [--shmid | -I ] --shm | -S [--shmid | -I ] --file | -f [--huge | -u] [--touch | -T] memory policy | --dump | -d | --dump-nodes | -D memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes ENDING TIMING RUN AT 2021-10-08 11:16:02 AM RESULT,RNN_SPEECH_RECOGNITION,,4,nvidia,2021-10-08 11:15:58 AM