urjaman / fast-usbserial

ATMega16U2 firmare - faster than arduino-usbserial
22 stars 6 forks source link

Update to latest LUFA #1

Closed NicoHood closed 9 years ago

NicoHood commented 9 years ago

Hi, I've seen this code and it really seems to be an awesome solution. I haven't had time to test it though.

It looks like you are using an old version of LUFA. Which isnt bad at all, but it would be interesting if you could update this to the latest LUFA and still get it working. Then we could also see it in the LUFA USB-Serial example and not only for Arduino.

And I guess you used Windows to compile the firmware (since I didnt get that working under linux again with those old sources). You could also update avr-gcc to 4.8.1 or 4.9.2 to maybe get more speed or more flash improvements: https://nicohood.wordpress.com/2015/02/20/compiling-with-avr-gcc-4-9-2-under-windows/ https://nicohood.wordpress.com/2015/01/24/installing-avr-gcc-4-8-1-and-arduino-ide-1-6-on-raspberry-pi/

I am developing a 16u2 Bootloader that gives the opportunity to upload sketches to the 328 as usual and also to the 16u2 itself. If I have time I'd like to integrate this, if you'd like to. Is this okay for you? If you feel like playing with the bootloader and apply those patches before I've got time to, feel free to help. https://github.com/NicoHood/HoodLoader2 https://github.com/NicoHood/HoodLoader2/issues/13

It would be nice if you could do a little writeup about how this works and what youve' changed? Hope I have a chance to test this soon. (I need a 32u2 .hex file so I need to recompile it myself before using it).

NicoHood commented 9 years ago

Okay I found a 16u2 Mega and uploaded your .hex file.

Tested with this sketch:

void setup() {
  // put your setup code here, to run once:
  //Serial.begin(230400);
  Serial.begin(250000);
}

void loop() {
  // put your main code here, to run repeatedly:
  Serial.println("Hello World");
  delay(1000);
}

230400 works but 250000 doesnt work. Using one of the nightly IDE that supports the serial monitor up to 250M. Doesnt work means I see crap.

Edit: Using an USB HUB (Sparkfun cerberus in between). Also tried with direct connection and didnt work too. Using USB3 Host.

Edit2: Weird thing: I can use the 250M IDE setting with a 230400 uploaded sketch? Will try to update the IDE now.

Here is more information about what I am using: lsusb gives Bus 003 Device 021: ID 03eb:204b Atmel Corp. LUFA USB to Serial Adapter Project for the new usb firmware. The firmware is uploaded via my HoodLoader2, but shouldnt make any difference to the DFU bootloader.

I used lsusb to ensure the firmware is running. Its a 16u2 and the main MCU is an atmega2560. The host is Ubuntu x64 14.04 with Arduino IDE 1.6.5 (updated now to the release).

I also tried screen /dev/ttyACM0 250000 to ensure the IDE is not messing the thing up.

Oh and I have to admit its not an offical Mega. So it might be an hardware problem? I could try my official uno but since its the only Arduino where I did not touch the 16u2 firmware I'd like to keep it like that.

urjaman commented 9 years ago

I guess your real issue is that on linux you cant set 250k baud (noticed when my test app said "cant set baud 250000"), because it isnt one of the supported baudrates by linux termios. Btw at 16Mhz 230400 is also really impossible (off by 3.5% or more if not U2X ... actually without U2X its 250k....), it ends up being something close but not really, but if both parties are still at the same wrong speed it'll work. See /usr/include/bits/termios.h : ...

define B230400 0010003

define B460800 0010004

define B500000 0010005

define B576000 0010006

define B921600 0010007

define B1000000 0010010

define B1152000 0010011

define B1500000 0010012

define B2000000 0010013

...

Try going higher ;)

EDIT: Sorry I didnt read your first post... "It looks like you are using an old version of LUFA. " More like i took bits from an old version of LUFA and tore it to pieces and am only using a little bit of it that i've read over some crazy amount of times. There's no point in using the entire bloaty LUFA for this just in my opinion.

I'm using Arch Linux' avr-gcc 4.9.2 to compile this. Also tried a prelease gcc 5.1.1 based toolchain that i tested (self-built), it also makes a working binary (though slightly bigger).

But really it isnt a much of a gcc issue when the code is peppered with inline asm to get rid of sometimes single instructions to make it faster ... faster.... ;P Then i got it fast enough to blast 2Mbaud both directions, and decided that there's no point on doing more.

You're free to use it in whatever you want, just be aware that it has pretty much single focus, and I'd dislike using it in an application where you dont like reading through the entire objdump :P Eg. it uses registers r2 - r5 reserved for the UART ISRs to globber and even though i have them mapped to some register variables in a header i still like to check that gcc isnt touching them in any code...

EDIT2: Reopened in case you still want to comment (the closing was a bit reflexive...)

urjaman commented 9 years ago

Oh and regarding 8u2/16u2/32u2 ... the hex is built for 16u2, but actually... I tested, the build for 8u2 results in an identical hex file (binary identical), so it'll work in a 8u2 too. Then i built for a 32u2 ... there is 1 byte of difference, and thats the loading of the stack pointer since the 32u2 has more ram, but as 512 bytes is enough for this app, the 16u2 hex will work for 32u2 too (just not touching the extra ram) :P

NicoHood commented 9 years ago

Hm and how can I fix that Linux uses a non standard baud? I mean you got it working with 1M you said. How did you do that? What serial monitor/command did you use? By the way: my Arduino Nano with FTDI works with 250k.

And I am interested on how you got avr-gcc 4.9.2 working under linux. Any reference?

urjaman commented 9 years ago

for pulling the entire avr toolchain in arch linux :P

pacman -S avr-libc

(I said that i use the toolchain provided by arch linux... Of course i'm not writing with arduino stuff so there's pretty much nothing more to it, just a Makefile and avr-gcc in path and "make".)

Re: speeds, I listed the relevant piece of the header, linux has 500k and 1M and 2M, not 250k. Atleast in that bit of termios, I'm not sure if there's some alternative but i havent heard of it.

I usually use minicom for a serial terminal, though mostly (when using stuff with high baudrates) I just have applications that use the serial port.. starting from like serialtest.c there: https://github.com/urjaman/m328-uart-bandwidth-test and ending up with the serprog driver in flashrom.

NicoHood commented 9 years ago

Now I got it working. The funny thing is it also works with my HoodLoader bootloader which is simply the old firmware + CDC bootloader.

The only difference was, that I now use minicom which can open the Serial port properly. screen and the Arduino IDE had problems to do that. I dont know the reason for that, but it shouldnt be so hard to fix this? It worked even at 2M with both.

I have to admit that I only tested Serial TX from the MCU to USB Host with not much traffic. So on high loads this might fail. But the thing I am wondering about now is why cant the Arduino IDE open the Serial port properly. Same goes for the program screen. Any Idea?

NicoHood commented 9 years ago

More progress: As you know I can upload custom sketches to my 16u2 which makes it a bit easier for me to test things easily with the IDE. I got an USB-Serial like program. So I tested that with 250000. And I printed to the Serial monitor what the PC reported me as baud rate. And this was the last selected baud rate, and not 250k. And that is the reason why nothing works. Because if I ignore the USB setting and force the Serial1 (to the 328) to 250k it just works fine. So this is maybe an OS bug or IDE bug. And thatswhy it works with your program, minicom and not with the IDE. Dont know how it would work with Windows.

Screen has the same problem. It uses 9600 if you try to enter 250k 500k 1M or 2M. All of them seem to be non standard. So this could be an OS problem.

By the way this has nothing to do with your firmware, since it is an OS problem. I am doing very slow printing of a few chars. The USB Host sets the USART wrong. Your firmware only helps on fast data transmissing (which is probably worth it) but couldnt even test it this far.

Edit: This sketch should also work with a Leonardo as USB-Serial bridge if you define a reset pin.

/*
 Copyright (c) 2014 NicoHood
 See the readme for credit to other people.

 USB-Serial

 Transferes from USB to HW Serial and vice versa.
 It also resets the main MCU on a DTR rise.
 */

// define the reset pin to reset the destination MCU.
// this definition is made for HoodLoader2 (pin 20)
// but you still can use it with any other USB MCU or pin
const int resetPin = MAIN_MCU_RESET_PIN;

void setup() {
  // set main MCU by default active
  pinMode(resetPin, OUTPUT);
  digitalWrite(resetPin, HIGH);

  // Start USB Serial
  Serial.begin(115200);
  while (!Serial);
  Serial.println("USB");
}

void loop() {
  // USB -> Serial
  uint8_t i;
  for (i = 0; i < USB_EP_SIZE; i++) {
    // read maximum one EP_SIZE to not block
    int b = Serial.read();
    if (b < 0)
      break;
    Serial1.write(b);
  }

  // Serial -> USB
  uint8_t buff[USB_EP_SIZE];
  for (i = 0; i < sizeof(buff); i++) {
    // read maximum one EP_SIZE to not block
    int b = Serial1.read();
    if (b < 0)
      break;
    buff[i] = b;
  }
  // send maximum one EP_SIZE to give the usb some time to flush the buffer
  Serial.write(buff, i);

  static uint32_t m = 0;
  if (millis() - m > 1000) {
    Serial.print("Baud: ");
    Serial.println(Serial.baud());
    m = millis();
  }
}

void CDC_LineEncodingEvent(void) {
  // start HW Serial with new baud rate
  Serial1.end();
  // if I add 250000 here it works properly
  Serial1.begin(Serial.baud());
}

void CDC_LineStateEvent(void) {
  // reset the main mcu if DTR goes HIGH
  if (Serial.dtr())
    digitalWrite(resetPin, LOW);
  else
    digitalWrite(resetPin, HIGH);
}
urjaman commented 9 years ago

this went a bit off-topic (and now inactive), so closing. If there's real issues with the fw open a new one, and if you just want to ask questions about it otherwise there's a thread on the arduino developers group: https://groups.google.com/a/arduino.cc/forum/#!topic/developers/-AQ_7Fs0lBc

NicoHood commented 9 years ago

May you please give me a hint, where you reserve the registers?

~~I think this lines reserves the ram for 256 + 128 bytes buffers: https://github.com/urjaman/fast-usbserial/blob/master/Makefile#L350~~

But I didnt find where you reserve those registers. Also a bit better documentation and better readable variable names would be nice. Whats wrp and rdp?

Edit: I think I found it: https://github.com/urjaman/fast-usbserial/commit/73b93bcb08d8bd0c434be2472874ac17f14a8c55#diff-84c9047131e8c974611352bd4267f2d5

Have you thought of this solution? http://www.avrfreaks.net/comment/573668#comment-573668

NicoHood commented 9 years ago

Also dont you miss here to set the higher Z pointer (r31) back to 0x02? Not sure if it needs to be backed up fully since there is no much more option than 0x02 for 500 bytes of ram. (for 32u2 compatibility it would be better to save it though). Or am I wrong?

ISR(USART1_RX_vect, ISR_NAKED)
{
    /* This ISR doesnt change SREG. Whoa. */
    asm volatile (
    "lds r3, %0\n\t" // load UDR1
    "movw r4, r30\n\t" // backup lower Z pointer
    "in r30, %1\n\t" // load USARTtoUSB_wrp pointer
    "ldi r31, 0x01\n\t" // set higher Z pointer to 0x01
    "st Z+, r3\n\t" // save UDR1 in Z pointer (USART to USB buffer) and increment
    "out %1, r30\n\t" // save back new USART to USB buffer pointer location
    "movw r30, r4\n\t" // restore backuped Z pointer
    "reti\n\t"
    :: "m" (UDR1), "I" (_SFR_IO_ADDR(USARTtoUSB_wrp))
    );
}

~~Edit: Also I cant see that r5 is used anywhere in the source or am I wrong?~~

Edit: My question is answered, movw copies 2 bytes and then it makes sense and r5 is needed.

You clear USBtoUSART pointers but not the USARTtoUSB pointers? https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L280

NicoHood commented 9 years ago

So after a bit of time playing with the code I found a few things I want to finally mention here in an ordered way. I might edit this post over and over.

The rest is just looking and coding style. Using better names and commenting more would have saved me some time, but I am happy that you published it anyways. So the most i am wondering about is r0 and r1. Might you want to tell me whats the magic behind that?

urjaman commented 9 years ago

I see -ffixed-r* and I'll look into it. Really doesnt matter for the fixed function usage where I check the compiler doesnt anyways touch my registers, but it could be prettier.

You use r0 and r1 in the ISR to acces the CBI, IN and OUT instructions. Isnt this totally unsave?

I think you mean %0 and %1, they are placeholders for the parameters in gcc inline assembler, eg: :: "m" (UDR1), "I" (_SFR_IO_ADDR(USBtoUSART_rdp)) These things. It'll replace them with memory or I/O addresses, which, well, are not registers (well, the I/O addresses are I/O regs but i meant registers registers :P).

The gotos make for beautiful assembler, and i think their control flow is simple enough (jump a few lines forward) that they dont make the code hard to understand either and save an unnecessary test in the asm code or duplication of led enable in the C code. It's your issue if you're that anal about using a perfectly fine language feature just because somebody has misused it.

https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L134 is necessary, the asm only clears the low byte - but yeah oh 0xFF might be better you're right. I'll look into it.

https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L145 answer: http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_intbits

https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L153 This essentially saves us from running through the LED enable code on the next loop around the loop since we already do it with the goto at the end of the block, but only if we havent received a byte since the calculation of cnt since if nothing received cnt will be txcnt lower than before on the next run.

You clear USBtoUSART pointers but not the USARTtoUSB pointers? https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L280

USARTtoUSB would contain data already received at the previous baudrate but not yet sent to PC, vs USBtoUSART would contain data wanted to be transmitted at the previous baudrate that would be sent at the new baudrate and i thought this would be worse, doing something not asked for... This was just some paranoia, maybe you could just let them be and not clear either, I'm not sure if USB CDC has some rules about how this should work. My alternative thought was to wait for the system to TX the data in USBtoUSART before changing baudrate but that could be bug-prone (and waiting to return from the USB event would be not nice i think) so I took the deletionist approach... basically, this could be wrong.

NicoHood commented 9 years ago

Thx for the explanation. I think I am understanding the whole thing better now.

What I missed is: GPIOR0 is not r0, it is not a ALU register, its something in RAM. Which is actually VERY clever and I havent heard of before. I thought you were using r0 and r1 which are widely used registers.

Is there a reason why you choose r2-r5? Otherwise I'd play with the lower registers since GCC generates very different code if your reserve them, depending on the register number. For example maybe r3-r6 could be more efficient. I'd also make a definition to set the reserved register completely flexible in the makefile.

I am doing this stuff currently now on my HoodLoader2. I will also add more comments. For example here are my reworked ISRs (after I understood the whole process). You may want to check if I counted the cycles correctly.

ISR(USART1_RX_vect, ISR_NAKED)
{
    // This ISR doesnt change SREG. Whoa.
    asm volatile (
    "lds r3, %[UDR1]\n\t"       // (1) Load new Serial byte (UDR1) into r3
    "movw r4, r30\n\t"          // (1) Backup Z pointer (r30 -> r4, r31 -> r5)
    "in r30, %[pointer]\n\t"    // (1) Load USARTtoUSB write buffer 8 bit pointer to lower Z pointer
    "ldi r31, 0x01\n\t"         // (1) Set higher Z pointer to 0x01
    "st Z+, r3\n\t"             // (2) Save UDR1 in Z pointer (USARTtoUSB write buffer) and increment
    "out %[pointer], r30\n\t"   // (1) Save back new USARTtoUSB buffer pointer location
    "movw r30, r4\n\t"          // (1) Restore backuped Z pointer
    "reti\n\t"                  // (4) Exit ISR

    // Inputs:
    :: [UDR1] "m" (UDR1),       // Memory location of UDR1
    [pointer] "I" (_SFR_IO_ADDR(USARTtoUSB_wrp)) // 8 bit pointer to USARTtoUSB write buffer
    );
}

ISR(USART1_UDRE_vect, ISR_NAKED)
{
  // Another SREG-less ISR.
  asm volatile (
    "movw r4, r30\n\t"              // (1) Backup Z pointer (r30 -> r4, r31 -> r5)
    "in r30, %[readPointer]\n\t"    // (1) Load USBtoUSART read buffer 8 bit pointer to lower Z pointer
    "ldi r31, 0x02\n\t"             // (1) Set higher Z pointer to 0x02
    "ld r3, Z+\n\t"                 // (2) Load next byte from USBtoUSART buffer into r3
    "sts %[UDR1], r3\n\t"           // (2) Save r3 (next byte) in UDR1
    "out %[readPointer], r30\n\t"   // (1) Save back new USBtoUSART read buffer pointer location
    "cbi %[readPointer], 7\n\t"     // (2) Wrap around for 128 bytes
                                    // smart after-the-fact andi 0x7F without using SREG :P
    "movw r30, r4\n\t"              // (1) Restore backuped Z pointer
    "in r2, %[readPointer]\n\t"     // (1) Load USBtoUSART read buffer 8 bit pointer to r2
    "lds r3, %[writePointer]\n\t"   // (1) Load USBtoUSART write buffer to r3
    "cpse r2, r3\n\t"               // (1/2) Check if USBtoUSART read buffer == USBtoUSART write buffer
    "reti\n\t"                      // (4) They are not equal, more bytes coming soon!
    "ldi r30, ((1<<RXCIE1) |        // (1) Set r30 temporary to new UCSR1B setting
    (1 << RXEN1) | (1 << TXEN1))\n\t"   // ldi needs an upper register, restore Z once more afterwards
    "sts %[UCSR1B], r30\n\t"        // (2) Turn off this interrupt (UDRIE1), all bytes sent
    "movw r30, r4\n\t"              // (1) Restore backuped Z pointer again (was overwritten again above)
    "reti\n\t"                      // (4) Exit ISR

    // Inputs:
    :: [UDR1] "m" (UDR1),
    [readPointer] "I" (_SFR_IO_ADDR(USBtoUSART_rdp)),   // 7 bit pointer to USBtoUSART read buffer
    [writePointer] "m" (USBtoUSART_wrp),                // 7 bit pointer to USBtoUSART write buffer
    [UCSR1B] "m" (UCSR1B)           // Memory location of UDR1
  );
}

I'll report more progress soon.

Edit: like this: https://github.com/NicoHood/HoodLoader2/commit/05a9df65eae9841eb6f6b30cdb185192efcefb4c https://github.com/NicoHood/HoodLoader2/commit/c56b14163856cbf0c9a3c4d1e147f146903b5a09

urjaman commented 9 years ago

Yeah the AVRs can have both "memory" (well it is like I/O reg but not plugged into anything) in I/O register space (GPIOR0..2 for the *U2 and M328s i think) and I/O registers in the extended I/O space (basically memory mapped I/O...) that can only be accessed by memory instructions (sts/lds) so it can be a bit confusing if you dont check what is being written/read.

urjaman commented 9 years ago

Oh on the registers... Basically, it should be the better the less you use of them and the lower they are (but r0,r1 reserved), so 4 regs from r2 onwards - the lower they are being that gcc seems to allocate from higher to lower (with exceptions based on the ABI on the ones used for function calling stuff), but i havent investigated much because this simple serial application doesnt need that many registers anyways. Also you need 2 consecutive regs* for movw, but otherwise the asm could be made to use any 4 regs, i just see no benefit from using something else, atleast for my usage.

*EDIT: An even register and the one after it. (so you cant have eg r3:r4, only r2:3 or r4:5)

NicoHood commented 9 years ago

I read that too (that gcc uses from higher to lower regs).

https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L95 Is a cli() and sei() really needed? Disabling the ISR is a single instruction. The TX ISR will still trigger though. I think it would make sense that if you unplug the USB device that the TX ISR will flush the data to the end (because a sudden USB disconnect should not lose 128 possible bytes of data). If you dont want to flush the data USBtoUSART_WritePtr = USBtoUSART_ReadPtr would make more sense (maybe) if that can be compiled as single instruction instead of two (not sure because one var is a GPIOR). You could reset it after a USB reconnect, how about that?

Here are my notes, please read the TODOs as well.

    uint8_t last_count = 0; //TODO move down?
    uint8_t USARTtoUSB_ReadPtr = 0; //TODO make global?

    // First disable USART receiving
    //cli();
    UCSR1B &= ~_BV(RXCIE1);
    USARTtoUSB_WritePtr = 0; // TODO how about not to init the Read ptr and copy the value here?
    //sei();

    // Wait for the USB Device to connect.
    // Till then give the USBtoUSART buffer time to flush.
    Endpoint_SelectEndpoint(ENDPOINT_CONTROLEP);
    do {
      if (Endpoint_IsSETUPReceived())
      USB_Device_ProcessControlRequest();
    } while (USB_DeviceState != DEVICE_STATE_Configured);

    // Reset USBtoUSART buffer on reconnect
    // set the GPIO to the preinitialized WritePtr
    // (to avoid HIGH MSB bits for 128 byte pointer)
    USBtoUSART_ReadPtr = USBtoUSART_WritePtr;
    // TODO dont we need to disbale the TX ISR flag bit if it has not flushed yet?

    // Enable USART and start USB flush timer
    UCSR1B |= _BV(RXCIE1); // TODO or remove the |= to reset any TX ISR flag
    TIFR0 = _BV(TOV0);
    TIFR1 = _BV(OCF1A);
urjaman commented 9 years ago

Single instruction? Really?

9f2:   f8 94           cli
9f4:   80 91 c9 00     lds     r24, 0x00C9
9f8:   8f 77           andi    r24, 0x7F       ; 127
9fa:   80 93 c9 00     sts     0x00C9, r24
9fe:   1a bc           out     0x2a, r1        ; 42
a00:   10 92 91 02     sts     0x0291, r1
a04:   1e ba           out     0x1e, r1        ; 30
a06:   78 94           sei

Remember the extended I/O space... Honestly, I havent thought about that sequence so hard, just being rather safe than sorry there.

urjaman commented 9 years ago

Re your TODOs: USARTtoUSB read "pointer" doesnt need to be global since it is read only from that function (in my code atleast), which makes it very efficiently allocated to a register by gcc when declared inside the function.

The last_cnt initialization location doesnt really matter (since gcc can see its only used after that not configured loop and clear it later), but its in my code semantically in the same place as that clearing of USARTtoUSB stuff that it is used to track (for leds).

Writing USARTtoUSB read pointer to the write pointer would either be no change (we just set it to 0 and it is a local register variable so gcc can infer it is also 0 and use r1 instead which for C is basically always 0) or slower if you move it global and you need to read it (2 cycles) and write that register to the I/O location. Also if the read pointer was global it would still be cleared to 0 by the bss clearing loop and waste some cycles on boot (propably more than the single ldi or such for local variables).

Basically reading anything at all (to set something to that value) is usually slower than setting something to 0 or the same if it happens to be in an ALU register.

EDIT: And yeah we have 2 GPIORs, one SRAM location (bss) and one local variable (ALU register) for our buffer management stuff so it is a bit confusing what is optimal. I got an idea for one thing that would actually be faster for this stuff, see that in later commit...

disbale

Yes. Btw, I took the another direction with this one... let TX run, even across configurations.

NicoHood commented 9 years ago

Looking some time at the code it turns out that your code is (for this first part) pretty much perfect. I still dont know if its worth to flush the USBtoUSART data on a disconnect (also its not likely this will happen).

Lets pretend we leave your code like this. Will resetting the USBtoUSART pointers to an equal value be enough to disable the TX interrupt from no longer triggering? You do not disable TXCIE1 or TXEN1, only RXCIE1. If the TX interrupt triggers again it will send 128 more bytes and then self disables. Maybe you should disable TXCIE1 along with RXCIE1.

Some thoughts while coding (with USBtoUSART flush till a reconnect):

    uint8_t last_count = 0; //TODO move down?

    // Disable USART receiving (but keep transmitting)
    UCSR1B &= ~_BV(RXCIE1); //TODO just leave it on an ignore received bytes?

    // Wait for the USB Device to (re)connect.
    // Till then give the USBtoUSART buffer time to flush (from last loop).
    // TODO USARTtoUSB data will be recorded but discarded afterwards
    Endpoint_SelectEndpoint(ENDPOINT_CONTROLEP);
    do {
      if (Endpoint_IsSETUPReceived())
      USB_Device_ProcessControlRequest();
    } while (USB_DeviceState != DEVICE_STATE_Configured);

    //TODO what if CDC event triggers now and USART config gets destroyed below?
    //wouldnt it be better to move the stuff back up again and forget about flushing USBtoUSART?

    // Safety reset pointers to an equal number.
    // cli() is needed since the EVENT_CDC_Device_LineEncodingChanged
    // can enable the interrupts again while restting
    cli();
    USARTtoUSB_WritePtr = 0;
    USBtoUSART_ReadPtr = 0;
    USBtoUSART_WritePtr = 0;
    // Disable all USART interrupts
    // Setting this to zero is needed since
    // if the TX ISR finished it sets the RX bits again.
    UCSR1B = 0;
    sei();

    // USARTtoUSB pointer is interrupt save
    // Add this to the RAM now to let gcc optimize the ram/register location better
    uint8_t USARTtoUSB_ReadPtr = 0; //TODO still move to global to save ram?

    // Enable USART RX interrupts
    // TODO why do we need this is the CDC event sets those bits after USART reconfiguration anyways?
    // TODO if we leave it, should we also enable _BV(TXEN1) | _BV(RXEN1)? 
    UCSR1B |= _BV(RXCIE1);

    // start USB flush timer
    TIFR0 = _BV(TOV0); //TODO needed?
    TIFR1 = _BV(OCF1A);
urjaman commented 9 years ago
// start USB flush timer

nope, just clear them if they happened during not configured. setuphardware starts them. not really needed tbh, nobody gonna get really hurt from a single trigger of either timer...

i dont have more time to look at this today, so bye, i'll read more later.

NicoHood commented 9 years ago

Sure. I'll post some thoughts here.

You should have a look at DEVICE_STATE_AS_GPIOR on this page: http://www.fourwalledcubicle.com/files/LUFA/Doc/130303/html/_page__token_summary.html

Since you are using a very small version of LUFA I do now know if this is related to you. But in my setting it is set to 0 which would conflict with the pointers. May we also use GPIOR 1 and 2? Since 0 is the default value I'll avoid to use 0 in the application. (if someone tries to copy the code it wont be broken with default configs)

urjaman commented 9 years ago

I set it to GPIOR2 since I wanted the cbi-addressable GPIOR0 for the uart tx isr (other GPIORs arent in the I/O address range usable by sbi/cbi). You'll need to rework the ISR if you want to use GPIOR0 for something else.

NicoHood commented 9 years ago

I am sorry but your new fix wont work correct. We have to completely forget about the USBtoUSART buffer. Flushing it will cause more problems than expected. Better add a while(UCSR1B & _BV(TXCIE1)); to flush the data if really needed.

If you leave it like this, USB_Device_ProcessControlRequest can call the CDC Line event. This will disable the TXCIE1 flag but leaves the buffers in a not synced state, if they havent flushed yet. And you do not reset this again, meaning it will keep the data on a reconnect. The next time it loads new USB bytes the buffer might be already full. The event will clear them to zero.

Or we could modify the CDC Linevent to not turn of TX, only RX via &=~. And enabling with |=. What about that?

Maybe we should also clear the CDC baud rate to zero on a reconnect? Add this before the Leds got cleared:

    // Reset CDC Serial setting
    LineEncoding.BaudRateBPS = 0;

I suggest:

So this one is faster/better than setting both to zero? https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L113

NicoHood commented 9 years ago

I was wrong for some things in my last post. Not sure if you can follow my ideas, its sometimes really complex.

What happens if we disable RXCIE1 when RXEN1 is still set? What happens if we receive a new byte? Will this byte be triggered after the ISR is enabled again or lost? (losing it would be better since the USART is expected to be off). Maybe you also need to disable RXEN1?

https://github.com/urjaman/fast-usbserial/blob/6ade97a926a71b97d038923e1559a7873bcc5b8c/fast-usbserial.c#L113 This needs to be above the USB reinitialization since it could trigger the CDC event and enable RX. We dont want to miss those bytes. Not sure if it is really that time critical or if the CDC can show up so fast after a reconnect?

I've added this to my version This forces the USB driver to reopen the device before sending new data. This helps to reduce a full buffer if TX hasnt flushed yet and the USB want to add new data (which I dont expect).

    // Reset CDC Serial setting
    LineEncoding.BaudRateBPS = 0;

In your current version USBtoUASRT_rdp is not initialized when it reaches this line https://github.com/urjaman/fast-usbserial/blob/6ade97a926a71b97d038923e1559a7873bcc5b8c/fast-usbserial.c#L115 It should not cause problems because CDC_Device_BytesReceived() will return false if the baud is set to 0 (and the baud is initialized). So it should be okay. Just want to mention it.

https://github.com/urjaman/fast-usbserial/blob/6ade97a926a71b97d038923e1559a7873bcc5b8c/fast-usbserial.c#L102 Is not needed since the CDC Event will enable it anyways.

NicoHood commented 9 years ago

~~https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L157 This needs to be a < if you ask me. Otherwise I dont get the Serial data with this sketch:~~

void setup() {
  Serial.begin(115200);
  Serial.println(F("Startupdfgdrhrtjhrruj"));
}
void loop() {
  if (Serial.available()) 
        Serial.write(Serial.read());
}

Also I heard that sending a number CDC_RX_EPSIZE bytes is not good, only send CDC_RX_EPSIZE -1 bytes. But I am not sure about that. See https://github.com/abcminiuser/lufa/blob/master/Projects/USBtoSerial/USBtoSerial.c#L116-L118

Edit: I will stop posting now, I am making too many mistakes. sorry for that.

Edit: I think those two lines were/are wrong. They need to be OUT, not IN. Still wondering why it worked in your program but not in my bootloader. (I am working with some newer syntax that confuses a bit). https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L156 https://github.com/urjaman/fast-usbserial/blob/master/fast-usbserial.c#L152

urjaman commented 9 years ago

Replying in roughly reverse order.

You're propably confused about OUT/IN in somewhere else (see the Endpoint_ClearIN() afterwards? this place uses the correct term) or OUT/IN vs RX/TX (i cant even remember which way the RX and TX (leds mostly) are supposed to be :P). OUT/IN are from the point of view of the USB Host, so IN is out of device (IN to host) and OUT is into device (OUT from host).

I'll benchmark that zero-length-packet avoidance later, that might be a good idea.

Re the above bits, please no more walls of text... lets see if i have any fast comments

In your current version USBtoUASRT_rdp is not initialized when it reaches this line

is initially (L85), i purposefully didnt clear it on the config/unconfig run to allow the TX to just go on TXing. But really, i have no method to test these edge cases, so it's all totally pedantic until someone has an application where this matters.

This needs to be above the USB reinitialization

Basically, no. Hopefully you get configured before you get a CDC event, and after the set configuration setup packet it would have flowed through from the above loop.... again, see above re test.

Yeah too much text, i'll stop here.

NicoHood commented 9 years ago

Oh I see. In/Out is seem from the USB Host and TX/RX (in newer versions) is seen from the USB device.

So you used a huge USB Bank for the USART-USB direction. 2x64 bytes. Why this? You have a very large ringbuffer (256 bytes) and still want to have such a large USB side buffer. Wouldnt it make more sense to enlarge the other USB bank for receiving data from the USB Host?

NicoHood commented 9 years ago

This comment might be also wrong: https://github.com/urjaman/fast-usbserial/blob/6ade97a926a71b97d038923e1559a7873bcc5b8c/fast-usbserial.c#L154-L155

We need to check if the Endpoint is IN ready. Clearing the endpoint does not mean the data has been read by the host. So we still need to check that before. What saves you here is the CDC_Device_SendByte_Prep() function above which checks if the endpoint is ReadWriteAllowed(). This will return an error if the bank is full and needs to be sent or if the data was not read by the host yet (it seems, which would be equal to IsINReady).

But this function (CDC_Device_SendByte_Prep) seems totally useless, a simple IsINReady() check would be enough, from what ive tested and read.

Explanation: https://groups.google.com/forum/#!topic/lufa-support/vWDFa1Mpn5A

Adding a -1 here makes sense as well https://github.com/urjaman/fast-usbserial/blob/6ade97a926a71b97d038923e1559a7873bcc5b8c/fast-usbserial.c#L156 Explanation: http://www.microchip.com/forums/m325655.aspx Your fix looks good: https://github.com/urjaman/fast-usbserial/commit/85c764d852e7dd8dec764542e4aa95740e7c73f6

NicoHood commented 9 years ago

Have you also tried not to use Timer1 for flushing the data? For me this example works at 2M (with minicom) and without timer1: https://github.com/arduino/Arduino/issues/2963

Is Timer 1 really needed?

Edit: Seems you test gives read skips if I remove timer 1. But I included the Timer0 instead. This saves flash. I just replaced the overflow check with timer 0 and clear it in the Led check below. As far as I understand your program this gives no error (also tested with 200 sec)

Done - Mode: RX&TX(0) dev-received / written 994754 / 1000001 - errs 0, read 994364 readskips 0

Is it normal that read < written?

NicoHood commented 9 years ago

HL2.0.5 will be released soon. Wiki is already updated. Feel free to test it. https://github.com/NicoHood/HoodLoader2/tree/fast-usbserial

NicoHood commented 9 years ago

I've released the version now. Thanks for your help! I've added you to the credits section here and to the release page: https://github.com/NicoHood/HoodLoader2/wiki/Useful-Links-Credits https://github.com/NicoHood/HoodLoader2/releases