Open sherry-yuan opened 2 years ago
Short Answers
Currently Drafted Solution
clCreateProgramWithBinaryAndProgramDeviceIntelFPGA
(there should be no issue) https://github.com/intel/fpga-runtime-for-opencl/blob/950f21dd079dfd55a473ba4122a4a9dca450e36f/src/acl_program.cpp#L544-L597Considerations
Resources
Summary
This is the summary version of above comment, see above for more detail. Feel free to comment if there is anything missed.
Short Answers
Launch kernel arguments: l_enqueue_kernel_with_type(commandqueue, kernel, ?workdim, _?global_workoffset, _?global_worksize, _?local_worksize, num_events_in_wait_list, event_waitlist, event, ?CL_COMMAND_MIGRATE_MEMOBJECTS);
Always pass in constant for size and offsets?
Main Questions
Edit: Discard this comment as there is a better solution below provided by Artem.
In regards to the question of how work_size (both local and global should be decided).
The current proposed solution for determining workgroup sizes:
Given the board
Then the formula for determining number of work groups constrained by size of global memory is:
Thanks Artem @artemrad for the info! There is no need for clEnqueueNDRange. clEnqueueTask will do what we wanted (without the need to know workgroup size)
Precise answer below """ In general you do no need WG size. Launch this kernel as a task, rather than a NDRange. So default to {1, 1, 1} for WG size and WI sizes; More specifically do what you would do if a kernel was launched with clEnqueueTask() instead of clEnqueueNDRangeKernel() """
Push to 2022.4 given that's the overall new target. Next steps, run sycl l3 set on the device global change with autodiscovery change, merge in the autodiscovery+runtime change. May depend on pushing specs in.
Theoretically it should be (given the lazy programming feature is available in runtime), but need to double check / at least layout how things should be called.
The precise questions are:
CC: @zibaiwan @pcolberg @aditikum