Accessing viennacl::scheduler::statement from statement_wrapper

ptillet commented 10 years ago

Hey!

So I've got familiar with PyViennaCL's code. It's very clean so it was not very difficult. Anyway, here is how the generator works (in my development branch), in C++. Actually, there is much more flexibility if we want to pack multiple operations, but it is not useful as of now to wrap that into python (since ViennaCL's OpenCL API has not be wrapped yet).

  viennacl::scheduler::statement statement(C, ASSIGN_TYPE, prod(A,B) );
  viennacl::device_specific::matrix_product profile("float",'N','N',/*some meta-parameters*/);
  viennacl::device_specific::generate_enqueue_statement(profile, statement);

Ideally, I'd wrap device_specific::matrix_product inside a clean python class, and use pyviennacl.Statement:

  statement = pyviennacl.Statement(A*B); #If I have browsed the code correctly, it is as of now impossible to construct "full" statements taking the RHS into account. Is that correct? If yes, it would be cool to integrate this within the GSoC 2014, as it might become useful later
  matrix_product = pyviennacl.MatrixProductProfile("float",'N','T',/*some parameters*/)
  pyviennacl.GenerateEnqueueStatement(matrix_product, statement);

However, it seems like scheduler::statement is never wrapper (statement_wrapper is wrapped instead), so it won't be easy to do this. I propose the following solution:

Instantiate a scheduler::statement within _viennacl::statement_wrapper
Make an additional read-write accessor inside the viennacl code statement.array() so that python can build the statement

Does it sound like a good solution to you? I don't have enough experience with pyviennacl to be entirely sure about the side effects it could have, if any.

PS : This kind of discussions should rather take place on a pyviennacl-dev mailing list I think, but I couldn't find any. Have I missed something? :P

tsmithe commented 10 years ago

Looks great. How about an API less like the C++ and more like this? I hope I've grasped the C++ semantics sufficiently. Basically, it should be possible for PyViennaCL to do the scheduler / generator dispatching intelligently, without much modification of current code; remember also, PyViennaCL doesn't execute statements until the user requests it, or the result of the computation is required by some non-PyViennaCL code.

statement = A * B # this doesn't execute anything yet, just represents the expression A * B
statement.profile = p.MatrixProductProfile(...)
statement.execute() # calls generate_enqueue_statement intelligently

If that makes sense, then great. Also, if you have a look again at the statement_wrapper class (line 135), then a viennacl::scheduler::statement object is actually created. I don't expose that object to Python because it's usually only needed once (at dispatch), and its structure might change until then. If we continue with PyViennaCL building C++ objects only when necessary, then we should only need to add the right (mostly hidden) logic for calling generate_enqueue_statement -- right?

Also, you're right, we probably should have a mailing list!

tsmithe commented 10 years ago

Also, I suspect that we could populate some of the profile parameters on the basis of PyViennaCL introspecting the objects (so things like 'float' could be automatically determined, perhaps?).

ptillet commented 10 years ago

Hey Toby,

Yes, It is a nice shortcut, nice hint! I think we could further improve it, since the generator supports multiple statements. Once pyviennacl's statements will support assignments, we should be able to do something like this

statements = [Statement(z = 2_x + y) ,Statement(y = 3_x + z) ,Statement(x = 2*z + 5)] pyviennacl.execute(statements); #Using the optimal profile as determined by an autotuner pyviennacl.execute(statements, p.VectorSaxpyProfile(...)); #Overriding the profile

This will generate a custom opencl kernel, with some optimized bandwidth:

for(unsigned int gid = get_global_id(0) ; gid < size ; gid+=get_global_size(0){ float vx = x[gid]; float vy = y[gid]; float vz = z[gid]; //perform all the operations in registers x[gid] = vx; y[gid] = vy; vz[gid] = vz; }

Of course, you can assume that there will be soon enough a proper overload of viennacl::execute(std::vectorviennacl::statement); for that purpose.

For now, overloading viennacl::execute(statement, profile) and providing the proper wrapper should be enough: pyviennacl.execute(statement, p.SomeProfile(...))

As for the profile parameters introspection, this is what I used to do (until today). I used to parse entirely the statement, and deduce the kind of operation needed (saxpy, dgemm, etc...). Making it implicit did have some drawbacks:

It doesn't make it clear how it handles mixed precision (it doesn't)
As the set of operations will grow larger, parsing the statement entirely will become error-prone and will possibly cause a pretty big overhead (potentially larger than the kernel launch time)
Since the generator can handle multiple statements, it's safer to force the user to be explicit about what he's doing ("I know what I'm doing : all these statements will use the "GEMV Transpose Float kernel"), rather than examining the all the statements and check that they can be, indeed, be packed together. ;)

2014-05-03 21:19 GMT+02:00 tsmithe notifications@github.com:

Also, I suspect that we could populate some of the profile parameters on the basis of PyViennaCL introspecting the objects (so things like 'float' could be automatically determined, perhaps?).

— Reply to this email directly or view it on GitHubhttps://github.com/viennacl/pyviennacl-dev/issues/15#issuecomment-42113959 .

ptillet commented 10 years ago

Hi,

I've got a simple python -tuner working as of now.

In C++, there will be two interface functions:

viennacl::device_specific::execute(profile, std::pair<statement, statement_node> const & statement);
viennacl::device_specific::execute(profile, std::list<std::pair<statement, statement_node>> const & statements);

Where each statement_node is a root node to execute the expression from.

I've decided to wrap that into python:

template.execute(statements);

Therein, I can check for whether it's a list or a single tuple. For now, lists cannot be supported, though, since we have no control over the LHS of a statement (we only have x + y, not z= x + y), so we cannot pass stuff such as :

template.execute([Statement(z=x+y), Statement(x=z+y)]);

Am I correct?

tsmithe commented 10 years ago

Hi Philippe -- I'm sorry I didn't respond to your previous message yet! First of all, when you write template.execute(...), what is the template object?

ptillet commented 10 years ago

Sorry. I should have precised that I have renamed Profile to Template, but perhaps GenerationTemplate would be more appropriate.

tsmithe commented 10 years ago

Also note that we can do assignment, but because of the semantics of the = operator in Python, you have to write Assign(LHS, RHS). As I'm sure you noticed, when you execute any statement in PyViennaCL, it silently constructs a holder for the result and an assign node to tell ViennaCL to store that result in the holder object.

tsmithe commented 10 years ago

Ah, OK.

ptillet commented 10 years ago

Ah, indeed, I had not seen Assign :O Perfect then, the question is closed. I think that you can also remove the generator from the "nice-to-have" of your GSoC project. I think that I can handle anything which is generation-related now that I'm more familiar with the pyviennacl codebase. :)

tsmithe commented 10 years ago

Great!

viennacl / pyviennacl-dev

Accessing viennacl::scheduler::statement from statement_wrapper #15