torch / torch7

http://torch.ch
Other
9k stars 2.38k forks source link

Merge AMD-HIP port #1031

Open VincentSC opened 7 years ago

VincentSC commented 7 years ago

AMD has just released their port for Torch7 to AMD GPUs, to be found here. Please merge that code, so Torch7 works on both NVidia and AMD hardware.

pavanky commented 7 years ago

This will need to be sent in as a PR by AMD. I am not sure what opening an issue for this accomplishes.

VincentSC commented 7 years ago

For discussion purposes mostly, before AMD comes with that big PR.

pavanky commented 7 years ago

Then perhaps change the title and description to say that?

soumith commented 7 years ago

thanks a lot @VincentSC We have to think through how we abstract devices. Right now I think having hiptorch as a separate package, similar to how we have cutorch as a similar package makes sense. We are working on refactoring internals to figure out the right device abstractions.

wickedfoo commented 7 years ago

Having a HIP version be separate from cutorch I think would only encourage code rot (on the HIP side of things), as cutorch changes fairly frequently. I wonder to what extent the HIP port has kept up with changes in cutorch?

As a standalone package, it certainly would be much easier, but who on our end would be responsible for keeping the HIP port up to data with functionality additions and changes?

wickedfoo commented 7 years ago

unless we can factor out the kernel side of things from the glue/boilerplate, in which case CUDA and HIP packages make sense. The ideal solution would have an easy way to compare the CUDA and HIP implementations of a particular kernel, so one doesn't have to go chasing for differences.

soumith commented 7 years ago

yea, we are figuring out the refactor, and introducing a proper device-abstraction.

VincentSC commented 7 years ago

@soumith Some background. HIP was designed to avoid code rot (as @wickedfoo mentions) - OpenCL has had this problem when added to CUDA, I've seen in several cases. As HIP is a subset of CUDA, it could be handled as there is a Kepler-optimizes and Pascal-optimized branch - large parts can be shared code while some parts are ifdef'ed. Do feel free to ask me anything on HIP, hcc and AMD's other new libraries and compilers.