taijusti / sleep_apnea

0 stars 0 forks source link

Test global manager on hardware #21

Closed taijusti closed 9 years ago

taijusti commented 9 years ago

it's currently hanging on an if statement -.- gonna try to restructure the host code a little to reduce the control flow. maybe hls wont get confused then

taijusti commented 9 years ago

after a day and a half of randomly moving stuff around and making minor structural changes, HLS still does not interpret the code properly. i'm gonna do a major restructuring on the loops of host(). fingers cross that with a different (simpler?) implementation of host, HLS will actually interpret it right

taijusti commented 9 years ago

not too sure what is going on... the solution to SMO should be unique since its a convex optimization problem...

i tried to change the algorithm a little so that itll only stop when there are no kkt_violators. even then, the hw solution has about 10% error rate whereas the sw solution has a 5% error rate.

going to go back and try moving other things around without changing the algorithm itself

taijusti commented 9 years ago

ok.. i think i have a solution that works based on what im seeing in the analysis tab. i forgot to check in my test that im running on zynq, so ill go in tomorrow to see if my solution works. fingers crossed

taijusti commented 9 years ago

sigh... ok, ive transformed the code to this:

send(COMMAND_INIT_DATA, out);
for (i = 0; i < ELEMENTS; i++) {
    send(data[i], out);
    send(y[i], out);
}
callDevice(in, out);

send(COMMAND_GET_KKT, out);
callDevice(in, out);
recv(kkt_violators, in);

and you never see send(COMMAND_GET_KKT, out); execute in hw. my random guess is that the blocking receive is happening before the write, causing a deadlock

ghost commented 9 years ago

do you mean that recv(kkt_violators, in); is happening before the send(COMMAND_GET..) ? Did you try co-sim for the host ?

On Sun, May 31, 2015 at 4:04 PM, Justin notifications@github.com wrote:

sigh... ok, ive transformed the code to this:

send(COMMAND_INIT_DATA, out); for (i = 0; i < ELEMENTS; i++) { send(data[i], out); send(y[i], out); } callDevice(in, out);

send(COMMAND_GET_KKT, out); callDevice(in, out); recv(kkt_violators, in);

and you never see send(COMMAND_GET_KKT, out); execute in hw. my random guess is that the blocking receive is happening before the write, causing a deadlock

— Reply to this email directly or view it on GitHub https://github.com/taijusti/sleep_apnea/issues/21#issuecomment-107240529 .

taijusti commented 9 years ago

do you mean that recv(kkt_violators, in); is happening before the send(COMMAND_GET..) ?

that's my current theory right now

Did you try co-sim for the host ?

no, i'll consider doing it.

taijusti commented 9 years ago

lifting the kkt command above the init as an experiment... absolutely nothing runs now!

send(COMMAND_GET_KKT, out); callDevice(in, out); recv(kkt_violators, in);

send(COMMAND_INIT_DATA, out); for (i = 0; i < ELEMENTS; i++) { send(data[i], out); send(y[i], out); } callDevice(in, out);

taijusti commented 9 years ago

confirmed that it is lifting the recv() above the send(). i did this by sending an arbitrary value to the host to get it past the recv. i then saw the send() come out of host. going to ask on xilinx forums how to get around this since i have no idea myself. i will work on cleanup while im waiting. after that, ill take a look at ethernet if there is nothing else.

taijusti commented 9 years ago

known solution found/confirmed. applying fix to master branch.

ghost commented 9 years ago

wow great news, what is the fix then ?

On Mon, Jun 1, 2015 at 2:02 PM, Justin notifications@github.com wrote:

known solution found/confirmed. applying fix to master branch.

— Reply to this email directly or view it on GitHub https://github.com/taijusti/sleep_apnea/issues/21#issuecomment-107656056 .

taijusti commented 9 years ago

http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Vivado-HLS-incorrectly-lifting-blocking-read-over-blocking-write/td-p/631408

ghost commented 9 years ago

I just saw the response on the forum. its annoying that you spent that much time on it, there should be proper documentation on such stuff.

taijusti commented 9 years ago

it turns out its sort of documented under the protocol directive. it just wasn't obvious to me that's what i needed to do. they still need to add more documentation.

taijusti commented 9 years ago

i should mention even though i know what the solution is, its still super painful -.-. have to define the order of operations cycle by cycle. also, vivado seems to move ap_waits() around... not too sure whats going on.

ghost commented 9 years ago

What do you mean by defining the order of operation cycle by cycle? I thought you will just add a protocol directive to blocks that perform write then read and insert a wait between them, right ? so you would only do minimal changes. Vivado should not move ap_wait if it is in a protocol block right ? that is what the user guide says.

taijusti commented 9 years ago

unfortunately it does seem to move the waits around. maybe im doing something wrong. ive already tried putting only one wait between all writes and reads and it does nothing.

ghost commented 9 years ago

I am not sure if it makes any difference but are using ap_utils.h or systemc.h. In the user guide they say the following. • In C by using an ap_wait() statement (include ap_utils.h) • In C++ and SystemC designs by using the wait() statement (include systemc.h).

taijusti commented 9 years ago

i was using ap_utils. i just tried systemc.h and it gives many compile time libraries within the h files. The first of which is:

@E [HLS-70] Compilation errors found: In file included from host/host.cpp:12: In file included from /opt/Xilinx/Vivado_HLS/2014.4/lnx64/tools/systemc/include/systemc.h:244: In file included from /opt/Xilinx/Vivado_HLS/2014.4/lnx64/tools/systemc/include/systemc:72: In file included from /opt/Xilinx/Vivado_HLS/2014.4/lnx64/tools/systemc/include/sysc/kernel/sc_module.h:37: In file included from /opt/Xilinx/Vivado_HLS/2014.4/lnx64/tools/systemc/include/sysc/kernel/sc_wait.h:32: In file included from /opt/Xilinx/Vivado_HLS/2014.4/lnx64/tools/systemc/include/sysc/kernel/sc_simcontext.h:34: /opt/Xilinx/Vivado_HLS/2014.4/lnx64/tools/systemc/include/sysc/utils/sc_hash.h:40:19: error: declaration of anonymous class must be a definition template<class K, class C> //template class

ap_fifo is inded injecting waits, its just that vivado doesn't seem to respect them and moves them around. i think its because it only works for primitive accesses, not functions.

taijusti commented 9 years ago

ok... out of desperation, i've asked the resident hls expert in our group. she has a solution i think will work that i'm going to try to implement now. it's completely different from this protocol/ap_wait thing.

taijusti commented 9 years ago

ok... i have a solution which looks good on the analysis tab (which i could not say about the other solutions). i will try it first thing tomorrow. fingers crossed guys...

taijusti commented 9 years ago

ok, this time the solution works for sure! just going through the normal process of making sure i implemented it right everywhere. hopefully there are no other bugs.

ghost commented 9 years ago

great news:), I am curious to know what was the "insider" fix you got ?

On Tue, Jun 2, 2015 at 11:43 AM, Justin notifications@github.com wrote:

ok, this time the solution works for sure! just going through the normal process of making sure i implemented it right everywhere. hopefully there are no other bugs.

— Reply to this email directly or view it on GitHub https://github.com/taijusti/sleep_apnea/issues/21#issuecomment-107995308 .

taijusti commented 9 years ago

she told me that vivado hls isn't as thoroughly tested as you might expect. directives do not necessarily work with each other, or at all for that matter. one of the things she told me to do is manually inline everything, which is what resolved the problem. by manually inline, i mean take the contents of all send/recv functions and put them into host. so the code is sort of a mess right now.

i wanna merge to master right now. may i do this? the host/device are definitely communicating right now. i can see it on the arm core. weird thing is it is either taking a long time or not converging (its been running for 7 minutes now). it still passes host_tb.cpp so we're no worse off than before if i do merge. its just that the code is kinda messy. it would be helpful if you could skim through if to see if you think its acceptable to merge to master.

taijusti commented 9 years ago

ok.. so i think there's something wrong with the device after all. either that or im reading from it wrong or something. after initialize it, it should be e[i] = y[i] ? -1 : 1. but right now im seeing e[i] = 1. HLS is once again synthesizing something wrong cause host_tb still passes -.-. gah!!!!

this bug sounds easier to debug now though.

taijusti commented 9 years ago

oh wait.. i think i know the problem. it's that bug that ibrahim found in common.cpp. i forgot to re-synth device since it uses that.

ghost commented 9 years ago

I think the device is correct, it passes co-sim.

On Tue, Jun 2, 2015 at 9:36 PM, Justin notifications@github.com wrote:

oh wait.. i think i know the problem. it's that bug that ibrahim found in common.cpp. i forgot to re-synth device since it uses that.

— Reply to this email directly or view it on GitHub https://github.com/taijusti/sleep_apnea/issues/21#issuecomment-108153640 .

taijusti commented 9 years ago

only with your change correct changing the uint32_t to int32_t in send() for floats in common.cpp?

ghost commented 9 years ago

yes

On Tue, Jun 2, 2015 at 9:51 PM, Justin notifications@github.com wrote:

only your change correct changing the uint32_t to int32_t in send() for floats in common.cpp?

— Reply to this email directly or view it on GitHub https://github.com/taijusti/sleep_apnea/issues/21#issuecomment-108158515 .

taijusti commented 9 years ago

i decided to re-implement my stuff. something i did on justin_branch makes it not pass c-sim with random vectors (note that vanilla master does). the fix you mentioned isnt on master so i just reapplied it locally and am compiling right now. i will verify tomorrow morning.

ghost commented 9 years ago

okay, should I add my fix to the master branch ? I guess I should right.

On Tue, Jun 2, 2015 at 10:04 PM, Justin notifications@github.com wrote:

i decided to re-implement my stuff. something i did on justin_branch makes it not pass c-sim with random vectors (note that vanilla master does). the fix you mentioned isnt on master so i just reapplied it locally and am compiling right now. i will verify tomorrow morning.

— Reply to this email directly or view it on GitHub https://github.com/taijusti/sleep_apnea/issues/21#issuecomment-108159875 .

taijusti commented 9 years ago

ive added it to my local thing to try and get the hw working. if you'd like i can commit what i have. it's no worse than what is on github right now (i.e. host_tb passes)

taijusti commented 9 years ago

going to port over host.cpp and all related files to ARM and snoop the FIFOs to see if they match up with what HW is doing.

taijusti commented 9 years ago

this is confirmed to work on hardware, along with the device. full solution works for single device case!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!