I'm running CUDA 10.1 with the latest versions of TF and pyTorch a TeslaK80 and a 1080ti.
I'm running the stable version (0.1.1 -- I was unable to get the ESPnet version running) with a patched train.py implementing data_parallel_workaround() from master.
The model seems to be training -- but very inefficiently. If I watch GPU usage with nvidia-smi I see only intermittent GPU-Util spikes with CPU utilization at about 25% (8 cores @ 4.8 ghz).
hparams that may be relevant:
# Data loader
pin_memory=True,
num_workers=12,
# Training:
batch_size=12,
Do I just need to dramatically increase the num_workers to feed the GPUs more data? GPU temps look fine, data is on a super fast SSD, so I'm not sure what I'm doing wrong.
I'm running CUDA 10.1 with the latest versions of TF and pyTorch a TeslaK80 and a 1080ti.
I'm running the stable version (0.1.1 -- I was unable to get the ESPnet version running) with a patched train.py implementing data_parallel_workaround() from master.
The model seems to be training -- but very inefficiently. If I watch GPU usage with nvidia-smi I see only intermittent GPU-Util spikes with CPU utilization at about 25% (8 cores @ 4.8 ghz).
hparams that may be relevant:
Do I just need to dramatically increase the num_workers to feed the GPUs more data? GPU temps look fine, data is on a super fast SSD, so I'm not sure what I'm doing wrong.
FWIW, here's what I show in the python.exe stack:![image](https://user-images.githubusercontent.com/949444/84379293-2d8a0a00-abab-11ea-931f-79700b330cc3.png)