torch / cutorch

A CUDA backend for Torch7
Other
336 stars 208 forks source link

Build process calls "make -j" and causes fork bomb during install #525

Open ryanfb opened 8 years ago

ryanfb commented 8 years ago

As part of the Torch install process on my machine (OS X 10.11.6), when the cutorch install is started, I see:

Found CUDA on your machine. Installing CUDA packages
Warning: unmatched variable LUALIB
Warning: unmatched variable jopts

isTegra=$(uname -a   | grep -E '(tegra|aarch)' | wc | awk '{print $1'})
if [ "1" -eq "$isTegra"  ]
  then
    jopts=3
  else
    jopts=$(getconf _NPROCESSORS_ONLN)
fi

echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/Users/ryan/source/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/Users/ryan/source/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j install

If I'm lucky, this causes:

clang: error: unable to execute command: posix_spawn failed: Resource temporarily unavailable

And a make failure. If I'm unlucky, this causes my machine to lock up completely due to a fork bomb.

The workaround I'm using for this is to modify extra/cutorch/rocks/cutorch-scm-1.rockspec and replace $(MAKE) -j$(jopts) install and $(MAKE) install with a plain make install (maybe I could have gotten away with just stripping -j$(jopts), but I'm a little cautious after fork bombing myself so many times in a row). This may be a regression introduced by 37373ebea1cce61c29b5da6a34af0303b4b4976f because I've installed torch/cutorch on this machine many times in the past without this issue.

pakozm commented 8 years ago

Same problem here. Just removing -j$(jopts) is enough to avoid the fork bomb ;-)

EDIT: After the comment of Soumith, this problem can be related with memory, not fork bomb at all. It makes sense with my observations.

soumith commented 8 years ago

hmmm, that's so weird that that introduces a fork-bomb.

I wonder if it's a fork-bomb or just an out-of-memory slowdown. the nvcc processes that compile cutorch do take quite a bit of memory.

pakozm commented 8 years ago

Hi @soumith , probably you are right and it is an out-of-memory slowdown. Looking memory usage during compilation I observe how it increases until the computer was unresponsive. It can worth the effort to control -j$(jopts) parameter depending on the available memory... What do you think?

danplotnick commented 8 years ago

This is definitely a fork-bomb. I run Bash for Windows (Ubuntu 14.04) and have been watching system processes and resources as this runs. As with others, it hangs on building one of the NVCC objects: e.g. lib/THC/CMakeFiles/THC.dir/generated/THC_generated_THCTensorMathCompareTDouble.cu.o

Memory is sitting at a cool 15% (64GB) but CPU rams into 99% and I can hear my fan chugging. Task Manager shows a forking series of processes all associated with the Torch install (Dash, cudafe, nvcc, cmake, etc.) until I hit that CPU ceiling and start to hang. I have modified the rockspec file as suggested, and will see if that worked.