loathingKernel / ariadne-bootloader

A little less unfinished TFTP bootloader for Arduino Ethernet or Arduino with Ethernet Shield
45 stars 18 forks source link

change never timeOut logic #26

Closed hagaigold closed 2 years ago

hagaigold commented 7 years ago

if tftp started (write request) and then the connection been broken, before the first byte was sent (data packet) and wrote to the flash, Ariadne-Bootloader was hang in "never" timeout state.

this is very easy to replicate: send something with firewall on.

loathingKernel commented 7 years ago

I can't comment on this as I can't test it for the time being. I reviewed it a bit though and I have some questions about it. Will post them later.

hagaigold commented 7 years ago

this code is the reason that this fix is needed.

honestly , I don't see a good reason to the "Never timeout if there is no code in Flash" logic. but, again, I am new to this code, so i just "fix" it to work.

in re-thinnking this issue, it might be better just remove the "Never timeout.." logic.

loathingKernel commented 7 years ago

The reason this code exists is to prevent the bootloader from booting corrupt code. As it works right now, we don't cache anything as we receive it, we just dump it into the PROGMEM. If the upload is interrupted, we don't wipe the existing dumped binary because it is costly. That means that the bootloader can boot into corrupt code and get stuck which will require manual reset. To stop that from happening if we detect that the upload was interrupted, we zero-out the first word of the PROGMEM and this is there to check that condition.

hagaigold commented 7 years ago

To stop that from happening if we detect that the upload was interrupted, we zero-out the first word of the PROGMEM and this is there to check that condition.

there is only two place in code (relevant to tftp logic) when flash is zero-out: when Data Packet arrived and after timeout.

The reason for this fix, is exact this "protection" that you are describing, consider the following:

  1. clean state (all flash is zero-out)
  2. write request => port change from 69 to the data port .
  3. communication broken before a single byte was programmed.
  4. boot-loader "stuck" in never timeout state because of this code.
  5. new connection on port 69 is not possible because Ardaine switched to the data port.
  6. zero-out after timeout will never happen, and this is the fix: if flasing start, we should timedout to re-initialize the socket back to port 69, so new connection is now again possible.

is to prevent the bootloader from booting corrupt code

It might work as you describe, if after errors, the first word was zero-out (and also re-initialize the socket). but this is not how the code working, maybe it was the intention.. i don't know ?

what I do see in the code, is that EEPROM_IMG_STAT is acting as protection form booting corrupted code.

byw: without a proper crc for the image, there will always have a chance to boot a bad code.

loathingKernel commented 7 years ago

Just to clear this up, I never said that your fix isn't needed, I merely argued against the removal of the timeout logic. As I said before, I can't comment on the fix itself as I don't have a working setup to test it, and neither I remember how it used to work. I am hoping to get my ISP back soon, in order to test it.

EEPROM_IMG_STAT is not a protection, it is a way to force the bootloader to not boot into the code after a watchdog reset, but rather wait for a new program to be uploaded.

Yes, without CRC we can't be sure but that is not specified by the protocol.

hagaigold commented 7 years ago

after this discussion I came up with different fix, that keep the timeout logic intact, and address the issue of not having a procedure for "easily" socket resetting. this is lack in the code right now, and probably needed in at least 2 places where FIXME comment left behind. and, maybe will be needed in other places in the future.

basically, I extracted the "reset" logic out of the "if (timedOut())" block in main.c , and introduce another new block - "IsResetSocketRequired".

I wonder if this different approach, separation timedout from resetting, which replace this suggested fix (and also more straightforward i think) should be checked into this pull request, or better start a fresh thread?

hagaigold commented 2 years ago

too old.. need to resolve conflicts