Open iame6162013 opened 7 years ago
I'm not sure I understand really
I think the question is this - why is tf coriander implementing tensorflow instead of cuda. On Mon, 28 Aug 2017 at 17:12, Hugh Perkins notifications@github.com wrote:
I'm not sure I understand really
- what do you mean by "cuda compliant library"?
- in what sense do you feel that Coriander is not what you call a "cuda compliant library"?
- what would you do differently, if you were to start from scratch?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/tf-coriander/issues/62#issuecomment-325302884, or mute the thread https://github.com/notifications/unsubscribe-auth/AEc4DouyeUooIqbbB9yGJ12H7XIiAAl7ks5scoSCgaJpZM4PD42F .
Do you mean, why do we have to modify tensorflow at all, in order to run it on Coriander?
what do you mean by "cuda compliant library"?
in what sense do you feel that Coriander is not what you call a "cuda compliant library"?
what would you do differently, if you were to start from scratch?
Do you mean, why do we have to modify tensorflow at all, in order to run it on Coriander?
Yes, why do we have to modify tensorflow? Won't this be/Isn't this a huge pita?
(I do apologize if I'm misunderstanding the project )
Ah. Yes. Not having to modify Tensorflow at all would be better. Harder though :) . What are your thoughts on how we can move more towards this direction?
I think the question here is: Why make a transpiler from CUDA to OpenCL rather than an OpenCL library that implements the CUDA/CUNN API to be linked against by programs expecting CUDA.
I haven't wondered this about Coriander, but I have wondered why nobody seems to have implemented something like this yet.
HIP kinda does this, but the API function names have been changed to hip_whatever
rather than cu_whatever
. Possibly this was due to the atrocity that was the ruling in Oracle V Google, that APIs can be copyrighted. Or, possibly it was just pride.
It's possible that a shim between CUDA and HIP would be a small project, for someone familiar with both? That was my first thought when looking at HIP: https://rocm.github.io/documentation.html (https://rocm.github.io/documentation.html)
August 28, 2017 10:36 AM, "Hugh Perkins" wrote: Do you mean, why do we have to modify tensorflow at all, in order to run it on Coriander?
—
You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub (https://github.com/hughperkins/tf-coriander/issues/62#issuecomment-325308035), or mute the thread (https://github.com/notifications/unsubscribe-auth/ABHR3bw4v_plgrhoSTscjvag6QWidHRFks5scooBgaJpZM4PD42F).
I think the question here is: Why make a transpiler from CUDA to OpenCL rather than an OpenCL library that implements the CUDA/CUNN API to be linked against by programs expecting CUDA.
Ok. You're saying 'why make something that can compile NVIDIA® CUDA™ code into OpenCL, rather than just implementing an API, so we can run the NVIDIA® CUDA™ directly, without recompiling it?'
Taking a smallish step back, there are two parts to an NVIDIA® CUDA™ program:
For these kernels, the question is, how to get them to run on a non-CUDA, non-PTX, non-SASS GPU?
No, that would probably be extremely difficult, and as far as I know the host program doesn't care about what the kernels look like, just that they do what they're supposed to.
Fortunately amd seems to be pushing for hip support in tensorflow, but it doesn't seem to be going anywhere. Thus a shim between CUDA and HIP would be worthless for the time being.
No, that would probably be extremely difficult
It is difficult to get NVIDIA® CUDA™ GPU kernels to run on OpenCL, but that is what Coriander does. Coriander can run NVIDIA® CUDA™ GPU kernels on OpenCL 1.2 GPU hardware, with no modifications to the NVIDIA® CUDA™ source-code.
I think we had a little misunderstanding there, I thought you meant after the default compilation, which coriander doesn't do, I think.
I think there are at least 4 misunderstandings here :) . But since I'm one of the people misunderstanding the other people's potential misunderstandings, it is hard for me to clear up, or attempt to clear up, any potential misunderstandings :) . Or perhaps it is only me misunderstanding. Anyway :)
Let's go through:
For Carthal's statement, I think that the host-side API is only half the solution: you still need to somehow do something with the NVIDIA® CUDA™ GPU kernels, so they can run on OpenCL GPUs. Somehow, they need to be converted into something, eg SPIR, or OpenCL, or some proprietary language/bytecode, so they can run on non-CUDA GPUs. Coriander handles this :)
To iame6162013's original question: why do we need to modify Tensorflow? CAnt Coriander run without modifying Tensorflow?
To what extent do thes three assertions clear up some of the questions above?
(as far as, what are some obstacles to binary-level compatibility, a huge obvious one is enum values. For example, let's say in the client source-code there's a constant enum usage like:
if(cudaMalloc() == CUDA_OUT_OF_MEMORY) {
...
}
This means that Coriander needs to make sure that cudaMalloc
returns CUDA_OUT_OF_MEMORY
value. But, what is the value of CUDA_OUT_OF_MEMORY
? The client sourcecode doesnt say. This leaves a couple of options that I can see:
https://github.com/hughperkins/coriander/blob/master/include/cocl/cocl_memory.h#L33-L38
enum cudaMemcpyKind { // name used by thrust, in trivial_copy.inl, line 55 ish
cudaMemcpyDeviceToHost=111,
cudaMemcpyHostToDevice=222,
cudaMemcpyDeviceToDevice=333,
cudaMemcpyDefault=444 // from thrust, trivial_copy.inl
};
This means we need to at least recompile the client source-code, even if we dont need to actually modify it.
So, if I where to create this host-side API, and compile the kernels with coriander, would I then be capable of ~running~ compiling & running tensorflow without any changes to their source code?
look at the NVIDIA® CUDA™ Toolkit source-code. This seems undesirable, to avoid copyright issues etc
Yes, a clean room design is the preferable.
This means we need to at least recompile the client source-code, even if we dont need to actually modify it.
That shouldn't be a problem for the use cases I care about. :)
So, if I where to create this host-side API, and compile the kernels with coriander, would I then be capable of running compiling & running tensorflow without any changes to their source code?
In fact, Coriander already implements much of the NVIDIA® CUDA™ API.
So, you may say, "what was actually changed in Tensorflow, in the tf-coriander repository?". Good question! Mostly it comes down to:
#define
GOOGLE_CUDA
. I left this undefined, but then gradually uncommented out stuff, as I implemented it in Coriander. You have two options really:
GOOGLE_CUDA
, orGOOGLE_CUDA
undefined, and then just uncomment this bit by bitGOOGLE_CUDA
:
"But!", you say, "Whats all that nonsense in tensorflow/stream_executor/cl
?". Well, yes, arguably you could remove that, and simply use the CUDA stream_executor directly. I've no idea how hard/easy that is. Both options seem to have their good/bad points. The current option, of creating a new cl
stream_executor has the rather bad downside of making upgrading to new tensorflow versions look scary. On the other hand, since those files wont be modified by tensorflow upstream, they should be invariant across upgrade, approximately? Unclear.
Firstly, I want to say that all the work you've done is awesome and that I'm dying to get this to work on arch. :)
But why did you choose to port tensorflow to use coriander instead of trying to make coriander into a cuda compliant library?