tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 613 forks source link

Implement GPU kernels #118

Closed seanpmorgan closed 5 years ago

seanpmorgan commented 5 years ago

Currently we've omitted the gpu kernels from addons while we got everything setup. Now that CI testing is setup for GPU we should begin adding these to addons. There's a lot of things required so I wanted to start this issue so we can track / discuss.

Some of the TODO:

seanpmorgan commented 5 years ago

@gunan @martinwicke Is there anyway we can see how https://github.com/tensorflow/community/pull/2 is progressing?

We've been discussing how we would like to package an Addons that includes GPU kernels and I believe we'll be depending on this in order to have a single pip package that loads either gpu or non gpu kernels. Is that correct? Publishing an addons-gpu does not seem like a good idea if this is on the horizon

gunan commented 5 years ago

@yifeif and I are working towards this We are hoping to deliver this in Q2

seanpmorgan commented 5 years ago

Small update: dynamic kernels are now implemented on the tensorflow master branch so we can begin work on this.

Squadrick commented 5 years ago

@seanpmorgan Finally got myself an nvidia GPU. I can work on this.

Squadrick commented 5 years ago

@gunan Hey, do you have some sample code on how this is supposed to be used?

In the RFC, you mention a new API: tf.load_kernel_library, but I can't seem to find in the master branch of TensorFlow, I did find tf.load_library though.

gunan commented 5 years ago

During implementation, we decided to make it more general and name it "load_library" instead. Please use that function instead.

facaiy commented 5 years ago

@av8ramit Hi, Amit. Can you check the GPU test failures in #294? Which docker image do we use for gpu test? Is it tensorflow/tensorflow:custom-op-gpu image required by https://github.com/tensorflow/custom-op? Thank you.

av8ramit commented 5 years ago

Hi @facaiy we use an internal GCP image, but it's similar to the Docker image you listed.

seanpmorgan commented 5 years ago

@yifeif @gunan We're in the process of setting up our GPU env, but it looks as though tf-nightly-2.0-preview returns False when running tf.test.is_gpu_available(). If we run with tf-nightly-gpu-2.0-preview it correctly finds the device. As I understood it dynamic kernels have been merged into master so is this just a bug in device_lib or is the standard nightly package not able to run GPU kernels?

More information I'm running custom-op-gpu container and running:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

I can file an issue on tensorflow/tensorflow if it's just something that needs to be updated in that library.

seanpmorgan commented 5 years ago

@karmel @yifeif @gunan Friendly bump on the above question.

We're currently dealing with a failing test case that is only occuring on python2 tf-nightly-2.0-gpu. python3 gpu is passing because there hasn't been a released pip package for py34 gpu in 10 days. This makes is very difficult for us to work around these new errors when tf core nightly builds are failing frequently and we're not sure how to build our tests around them. See #373

Why is there still a tf-nightly-2.0-gpu package and is it going to stay around.

av8ramit commented 5 years ago

We have dropped support for Python3.4 packages as it is fairly old. @karmel could this issue be solved with upgrading the python3 version to Python3.5?

seanpmorgan commented 5 years ago

Good to know thanks. So we will upgrade our CI scripts to install py36, but for local testing the custom-op docker image we use only has py34 installed (still waiting on a new custom-op which looks like its soon)

Is there a reason for the 2.0-nightly-gpu pip package though? We were under the impression there would be a unified pip package now that dynamic kernels are in. We need to know this because we have a fail on GPU pip but not CPU pip package. It's difficult to troubleshoot without knowing the difference between these 2.

seanpmorgan commented 5 years ago

Closing this as all kernels are now implemented. I'll create a new issue for discussing how we will package GPU kernels as there is some discussion needed.

For record keeping... here are the notes from SIG Build meeitng regarding the 2 tf-core packages:

* ETA for tensorflow and tensorflow-gpu package to consolidate into a single pip package with dynamic kernels?
    * Ubuntu pip package available as tf-nightly-gpu, fixing windows gpu pip package now. If not 2.0, will be in 2.1 (will update this after confirming).
    * Dynamic kernels may take some more time to land
        * Packages are already available as tf-nightly-gpu but Windows is being fixed
        * Uncertain if this will happen in 2.0, because we’re not sure about breaking it
    * Jonathan: Will dynamic kernel support MKL/DNN?
        * Gunhan: Seems like no, haven’t figured this out yet