monome / druid

terminal interface for crow
GNU General Public License v3.0
35 stars 16 forks source link

upload of some scripts cause crash #44

Closed tehn closed 4 years ago

tehn commented 4 years ago

note, this could be a crossover issue between the crow firmware and druid.

reports of this on the forum, but my own case is this:

i can run the attached script on linux.

on macos, druid freezes immediately on pushing enter after u new.lua or r new.lua

the only fix is to pull the USB cable.

it seems that the crow is possibly crashing. as when i pull the cable i see a flood of "disconnected" messages in druid.

here's the script.

local stage = 1
local expectations =
{ { in_chan = 1, out_chan = 1, expect = -5.0 } ,
{ in_chan = 1, out_chan = 2, expect = -1.0 } ,
{ in_chan = 1, out_chan = 3, expect = 3.0 } ,
{ in_chan = 1, out_chan = 4, expect = 9.0 } ,
{ in_chan = 2, out_chan = 4, expect = 7.0 } }

function test_case( e )
  print('in[' .. e.in_chan .. '] -> out[' .. e.out_chan .. ']')
  for i=1,2 do
    if i == e.in_chan then
      input[i].mode('stream',0.1)
    else
      input[i].mode('none')
    end
  end
  for i=1,4 do
    if i == e.out_chan then
      output[i].volts = e.expect
    else
      output[i].volts = 0.0
    end
  end
end

local threshold = 0.1 -- should be much tighter tolerance
function is_in_range( value, expectation )
  if value > (expectation - threshold)
   and value < (expectation + threshold) then
    return true
  end
end

input[1].stream = function(value)
  if is_in_range(value, expectations[stage].expect) then
    print('\tok!')
    stage = stage + 1
    test_case(expectations[stage])
  end
end

input[2].stream = function(value)
  if is_in_range(value, expectations[stage].expect) then
    print('TEST SUCCEEDED')
  end
end

function init()
  test_case(expectations[stage])
end
tehn commented 4 years ago

additionally:

on linux i frequently (but not always) get the echo bleed-back bug.

ie, syntax error near: bootstrappedsdfkljadlf

simonvanderveldt commented 4 years ago

i can run the attached script on linux.

Do you mean that you don't get the freeze on crow when using Linux whereas you do get the freeze on Mac OS?

tehn commented 4 years ago

i have occasionally succeeded in freezing druid on linux, but a huge majority of the time the first attempt to r new.lua fails (with syntax error) and then i re-run it with success. very rarely does it work first try. very rarely do i get a full freeze on linux.

csboling commented 4 years ago

@tehn @trentgill Here is what I have been able to figure out about this. I'm on Windows 10 with Python 3.7.4. To go much further I think I need to dig around and see if I still have some pin headers to solder on for the UART pins on crow.

For testing this I have made these changes:

My test procedure is as follows. Attempt the following 100 times in a row:

What winds up happening is one of a few things:

It seems like transmission errors or receive buffer corruption could account for this "dropped data" behavior, but crow remains functional for normal REPL operation after this occurs. However, if I rebuild without skipping event handling, crashes start occurring, typically within 3 or 4 attempts. What I mean by this is druid locks up, after killing druid the serial port remains unavailable, after replugging USB druid "can't find crow device", and after replugging in this state I often get a warning about an unknown USB device. After power cycling, crow is responsive again and the druid REPL works as expected. This is also the case on the 1.0.0 release firmware.

trentgill commented 4 years ago

@csboling @simonvanderveldt I forgot about this great set of bug reports. I think it makes sense that the event-handling is causing crashes. Basically there is interrupts continuing to fire on metro ticks, input streams or changes, and ASLs completing. I'm unsure why it crashes (is there a bug in the event lib, or is it the lualink?).

I think #234 should fix this though I've only tested on linux so far.

csboling commented 4 years ago

Unfortunately after updating to crow master, I see the following behavior on both Windows and OSX, using either today's or yesterdruid:

$ druid upload sketch.lua
 uploading sketch.lua

^^stream(1,0.002718449)
^^stream(1,0.002021551)
^^stream(1,0.002950668)

File uploaded
Traceback (most recent call last):
<snip>
  File "~\druid\src\druid\cli.py", line 55, in upload
    click.echo(crow.read(1000000).decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 31: invalid continuation byte

This is consistently 0xe1 in position 31 on both platforms, on multiple runs. Some variation in the number of ^^streams, from 1 to about 50. Length of the file does not really seem to matter? Eventually, druid will hang and be unable to reconnect to the REPL or to download until crow is rebooted -- eventually this will result in the same "device scan failed" message on Windows. Upon reboot the file does usually seem to have been uploaded successfully. Behavior is pretty similar in the REPL, a couple "File uploaded" messages that do seem to have worked, eventually a crash.

Edit: Seems I am only encountering problems when building from source (at f98b555, same as the v1.0.1b0 tag), when using the bin file from the prerelease zip it works very consistently! Could this patch have affected something with debug prints that are conditionally compiled or something? I tried running make R=1 but got some I2C code generation error in Docker that I haven't puzzled out yet.

szvsw commented 4 years ago

Just saw this issue, I originally filed an issue on crow, posting here instead since clearly you guys are dealing with this already!


the super short summary:


this video details how to reliably reproduce. I confirmed a bit more after making the video (see other comment below).


  1. erase_userscript.sh, flash v1.0.1b0
  2. make sure druid is updated 2a. nb: I was using pip3 install --upgrade monome-druid 2b. pip3 show monome-druid returns Version 0.1.1 2c. apologies if I should have been directly installing from the git repo instead somehow...
  3. enter druid
  4. upload scripts without an active stream as many times as you want.
  5. upload a script with an active stream (time<1.1seconds - above that, it seemingly becomes less likely but still possible to produce the bug which begins at step 6). the script will upload and run successfully.
    
    --- stream script

input[1].stream = function(cv) end

function init() input[1].mode('stream',1) end

6. upload any other script - it will upload and run fine, but I believe this is the point where the "bug occurs" from a code perspective (see comment below).
7. upload any other script - it will print `uploading <script>.lua` but will not make it to the `Running:` line. druid is not frozen (yet)
8. if you attempt to execute lua code (e.g. `print.lua`), druid will accept it but receive no response from crow. 
9. druid's current state:
9a. `q` out of druid here; display returns to powershell, but I am not prompting for a new command until after I disconnect or power cycle crow - it just shows a blank line.  after power cycling crow, I receive a command prompt and can then re-enter druid.  the last script uploaded in step 6 (not the attempted script in 7) will run; if that script did not have a stream, I can upload as many scripts as I want.  If it did have a stream, we are essentially about to be at step 6 again - one upload will work, the next will fail.
9b. if I attempt instead of `q`-ing out of druid, I attempt to upload another script in the while crow is in the "hung" state, druid totally freezes until I power cycle crow.

---

confirming that streams are the issue:

steps 1-5. are identical

6. instead of uploading another script, turn off the `stream`: `input[1].mode('none')`
7. you can now upload as many (non-streaming) scripts as you want. the issue does not occur.

---

confirming that 'change' seems fine, unlike 'stream':

uploading this script and triggering it at audio rate does not prevent subsequent uploads. 

--- change script

input[1].change= function(cv) print('caw') end

function init() input[1].mode('change',1,0.1,'rising') end


---

confirming that 'metro' seems fine, unlike 'stream'

uploading this script does not cause any issues with subsequent uploads. 

--- metro script

metro[1].event = function(c) print(c) end metro[1].time = .001

function init() metro[1]:start() end

trentgill commented 4 years ago

@csboling can you post the script.lua you're using? i think it might be the way i'm parsing the file to print a 'name' (which expects the first line to be a comment of the name). I can add protections, but I just checked on linux and it doesn't have the behaviour you note.

csboling commented 4 years ago

I've been just using the one from the first post on this issue, occasionally alternating with like

function init()
  print('caw?')
end
trentgill commented 4 years ago

Did you save it into a file on windows? I'm just wondering if it's a line-ending thing still.

csboling commented 4 years ago

Copied it off GitHub and saved that, on both Windows and OSX separately. I believe all the druid instances I was using have the fix to force \r\n line endings.

tehn commented 4 years ago

just tested on macos and confirmed the my test script now works.

trentgill commented 4 years ago

I think this was a crow bug, and hopefully fixed here: https://github.com/monome/crow/pull/237

Tested ok on linux. Testing on mac & windows now...

edit: still seeing some crashes on osx.

trentgill commented 4 years ago

I believe #54 fixes this.

trentgill commented 4 years ago

fixed by #54