greiman / SdFat

Arduino FAT16/FAT32 exFAT Library
MIT License
1.07k stars 502 forks source link

Strange conflict between SdFatSdioEX and Serial1-Serial6 on Teensy 3.6 #112

Open ghost opened 6 years ago

ghost commented 6 years ago

I have a Teensy 3.6, Aduino ide 1.8.5, current Teensyduino add-on to that, and SdFat 1.0.7.

My ultimate project is to emulate a vintage portable floppy drive that has a rs232 interface to a TRS-80 Model100. I need to use a real serial port and a level shifter to connect to the vintage machine, and read/write files on the sd card.

Starting with Examples>SdFat>TeensySdioDemo, and from there I stripped out the choice 1 vs choice 2 stuff and have it just wait for Enter to proceed. Only one SdFatSdioEX or SdFatSdio object is declared at the top, selected by comment/uncomment: //SdFatSdio sd; SdFatSdioEX sd;

ok so far so good, everything works. Next I want to switch from Serial to Serial1

Now there is a problem which baffles me.

Until sd.begin(), Serial1 works fine. I can print to it as much as I want. After sd.begin() I can only print up to 64 bytes in total to Serial1. If I print a 64th byte, the Teensy locks up. This is 64 bytes in total among any combination of print/write/println/etc commands, to Serial1.

HOWEVER, now it gets even stranger. If I have written less than 64 bytes in total out to Serial1 by the time the first file.write() happens in runTest(), all is good again after that!

Once the first file.write() happens, I can once again write as much as I want to Serial1 any time I want.

I made a little function that blasts 25K bytes of text all at once to whichever serial port is being used, and peppered the code with it to zero in on exactly when the serial port stops working, and starts working again.

The built-in usb serial port "Serial" has no such problem.

SdFatSdio has no such problem on either Serial or Serial1 or any other.

The problem actually also affects at least Serial1-Serial5 not just Serial1 I didn't test Serial6 yet because it has only solder pads.

  SdFatSdio   + Serial  = OK
  SdFatSdio   + Serial1 = OK
  SdFatSdioEX + Serial  = OK
  SdFatSdioEX + Serial1 = BAD
  SdFatSdioEX + Serial2 = BAD
  SdFatSdioEX + Serial3 = BAD
  SdFatSdioEX + Serial4 = BAD
  SdFatSdioEX + Serial5 = BAD
  SdFatSdioEX + Serial6 = untested (solder pads)

I don't really know if it's a SdFat problem or a Teensy problem, except for the clue that SdFatSdio without the EX exhibits no such problem.

I don't really need the speed boost from EX for this particular project, because a TRS-80 Model 100 only has 32K in total for both ram and file storage, and the serial port runs at only 19200, and the TPDD protocol only deals in little 128 bit packets one at at time, individually sent and acked before proceeding. So SdFatSdio is already faster than this job can ever touch. So, I'm fine simply not using the EX at least for this particular project. Just reporting what I found.

I can supply my Sketch if you want, but it's not reduced to a clean minimal example. It is commented a lot to show where the problem occurs and where it clears.

ghost commented 6 years ago

I managed to find a tolerable work-around...

Using both SdFatSdioEX and Serial1 at the same time, I can prevent the Teensy from locking up by doing a file.open() and file.sync() immediately after sd.begin()

Immediately in terms of: don't do anything else in between, specifically no Serial1.print(), not in terms of timing.

Once either file.write() or file.sync() has happened once, Serial1 has been somehow "fixed", and there are no more worries after that. file.read() does not prevent the problem, only write() or sync() I've found so far.

  if (!sd.begin()) {
    sd.initErrorHalt("sd.begin() failed");
  }
  // Starting here, Serial1 is "bad". If you write more than 64 bytes (cumulatively) to Serial1, Teensy will lock up.
  // Work-around:
  // Open a file and write at least one byte to file immediately after sd.begin()
  // Don't do any Serial1.print() between sd.begin() and file.write() 
  // file.sync() also works, without having to modify any data.
  sd.chvol();
  if (!file.open(tmp_fn, O_RDWR | O_CREAT)) {
    errorHalt("open failed");
  }
  //file.read();  // does not prevent problem
  //file.write('.');  // works, but only ok for tmp junk file, no good for real data
  file.sync();  // works, without actually modifying file
  // Serial1 is now ok again.
ghost commented 6 years ago

I found a seemingly even better work-around: I can prevent Teensy from locking up without having to open and sync a file now. It requires using sd.cacheClear() in two places. Once immediately after sd.begin(), and once immediately after file.open(). The advantage is, I don't have to move things around just to get a file.open() right after sd.begin(). In the TeensySdioDemo example, sd.begin() is in loop(), and file.open() is in runTest(). Using cacheClear(), I can leave those two parts where they were, and still print as much text as I want to Serial1 after sd.begine() and before file.write(). IE, I can have things like "Initializing temp file blah blah blah, and the column headers for the blocksize and kb/sec etc, without worrying about going over 64 bytes.

So the best work-around now looks like:

  foo () {
    ...
    if (!sd.begin()) {
      sd.initErrorHalt("sd.begin() failed");
    }
    sd.cacheClear();
    ...
  }
  ...
  bar () {
    ...
    if (!file.open(tmp_fn, O_RDWR | O_CREAT)) {
      errorHalt("open failed");
    }
    sd.cacheClear();
    ....
  }

Basically, any existing code as it always was, just with sd.cacheClear() added immediately after any sd.begin() or file.open().

It is required after both of those. Putting a cacheClear after sd.begin() allows Serial1 to function normally only up until file.open(). After file.open(), Serial1 is broken again. But cacheClear() after file.open() fixes it again, even without file.write() or file.sync().

So this is a much less intrusive fix, since you don't have to open and write a file for no other reason than to get the serial port put back into a working state, and you don't have to reorganize any code just to get the file.sync() as soon after sd.begin() as possible.

greiman commented 6 years ago

There must be some interaction between the pins used by SDIO and Serial1. The EX classes leave SDIO active so the SD card will see very large transfers. sync() or cacheClear() ends the transfer and makes SDIO idle.

The SDHC controller on Teensy 3.6 has many errata. I already have lots of workarounds to get the EX mode to work.

The K66 Port Control Module allows Serial1 and Serial3 to be selected on the same pins as the SD card. The SD card is selected when the MUX field is 4. The UART is selected for MUX 3.

See section 11.3.1 of the K66 datasheet. The SDHC controller is connected to PTE0 - PTE5.

Even though the pins selected for the SD socket shouldn't interact with Serial1/Serial3 it appears they do.

I don't understand the other Serial ports but if there is an errata for the SDHC controller who knows. I have the only SD library that leave the SDHC controller active for long periods.

You might see if Paul Stoffregen has any ideas.

Here is the code that selects the SDHC pins.

  const uint32_t PORT_CLK = PORT_PCR_MUX(4) | PORT_PCR_DSE;
  const uint32_t PORT_CMD_DATA = PORT_CLK | PORT_PCR_PS | PORT_PCR_PE;

  PORTE_PCR0 = enable ? PORT_CMD_DATA : 0;  // SDHC_D1
  PORTE_PCR1 = enable ? PORT_CMD_DATA : 0;  // SDHC_D0
  PORTE_PCR2 = enable ? PORT_CLK : 0;       // SDHC_CLK
  PORTE_PCR3 = enable ? PORT_CMD_DATA : 0;  // SDHC_CMD
  PORTE_PCR4 = enable ? PORT_CMD_DATA : 0;  // SDHC_D3
  PORTE_PCR5 = enable ? PORT_CMD_DATA : 0;  // SDHC_D2
ghost commented 6 years ago

Ok thanks for looking. And of course, when I try to set up a minimal sketch that does nothing but exhibit the symptom, it does not exhibit the symptom.

So there is something more involved than just what I described above. I'm working on producing a clean example.

ghost commented 6 years ago

Here is a sketch which is a clean copy of TeensySdioDemo, with just the minimal changes necessary to prove the problem, and the work-around. TeensySdioDemo_uart_conflict.zip

To see the parts that matter, search for "BKW" The rest of the code is untouched except for the global replace of Serial for Serial1 And the big comment at the top.

ghost commented 6 years ago

Here is a minimal sketch that follows all the rules described above, yet does NOT exhibit the problem the way TeensySdioDemo does. In it's current state as I zipped it, both cacheClear() are commented-out, and so the following sdtest() should lock up the Teensy, but they don't. TeensySdFatEX_uart_bug.zip

ghost commented 6 years ago

I originally uploaded a borked TeensySdioDemo_uart_conflict.zip above. It's now fixed and replaced. To see the problem, comment out either of the two sdEx.cacheClear(), and choose option 1 at run time.

ghost commented 6 years ago

NEWS! Commenting out the entire yield() function removes the problem!

ghost commented 6 years ago

Working minimal program to show problem.

Lots of comments to explain everything but the actual code is much more stripped down than TeensySdioDemo, and DOES show the problem.

Now it looks like the culprit is the yield() function from TeensySdioDemo.

It's zipped in the error-making condition, with lots of ways to fix the error by commenting or uncommenting things.

In this program, file.write() does not fix the uart like it does in TeensySdioDemo. In fact it seems to break it. I need another cacheClear() or file.sync() after file.write() Don't ask me.

Teensy_SdFatEX_yield_uart_bug.zip

greiman commented 6 years ago

I didn't intend the TeensySdioDemo yield() function to be used in programs other than the demo. The demo yield() function is used to collect info on how much CPU time is available for users that write their own yield() function. It is not required in normal apps.

The demo version of yield() could cause major problems if executed at the wrong time.

yield() is called various places in the Teensy system and in various libraries. Here is the system yield() hook.

/**
 * Empty yield() hook.
 *
 * This function is intended to be used by library writers to build
 * libraries or sketches that supports cooperative threads.
 *
 * Its defined as a weak symbol and it can be redefined to implement a
 * real cooperative scheduler.
 */
static void __empty() {
    // Empty
}
void yield(void) __attribute__ ((weak, alias("__empty")));

I did a search and yield() is called in Serial1 - Serial6. It will be executed when the buffer is full.

There is a call to yield() in Serial but it is not as likely to be executed since Serial is a fast USB port with a large buffer.

I should have put a warning on this comment at the top of the demo yield() to not use this function in your program. // Replace "weak" system yield() function.

In an earlier version I had a test at the start of the demo yield that prevent it from being executed until the test was running.

greiman commented 6 years ago

I had no idea the TeensySdioDemo example would result in a problem like this. Let me know if this solves the problem and I will put a warning/explanation on the yield() function.

ghost commented 6 years ago

This is good info to hear. Definitely simply removing yield() seems to have solved all problems. I didn't really know if it was ok to just remove it like that, but this reassures me it's fine to remove it.

But now I don't see what about the demo yield() actually causes any problem?

It doesn't seem like there is anything in there that should be illegal to call, even before sd.begin(). And indeed, I can write to Serial1 all I want before sd.begin(). I'm guessing that, since the sd object is declared in the global area at the top, then sd.card()->isBusy() should always be at least legal look at.

So, is it that "while (sdBusy())" causes yield() to puase while Serial1 may run on ahead and fill up a buffer? But then why does it lock up when there is no sdcard activity? And why does cacheClear() fix it?

At one level, my problem is solved. You've let me know that it's legal to just remove yield(), and I no longer have any Serial1 problems, and the sdcard works, AND I even get full SdFatSdioEX speed.

I'm just curious now what actually is the problem.

Thanks much!

greiman commented 6 years ago

I don't know exactly why the yield() function fails when called from Serial1 - Serail6.

In general a user replacement for yield() should not call a member function of a library that might call yield.

I got the yield() function to work in The demo example since I knew exactly where yield() was called in SdFat. There are six places and some of these probably cause a problem with a recursive call.

Here is an simple example of a user yield() function that causes the program to crash with a stack overflow due to recursive calls to yield(). It never prints Done since Serial1 goes into a tight loop calling yield() which calls Serial1.print().

void yield() {
  Serial1.println("In Yield");
}
void setup() {
  Serial.begin(9600);
  Serial1.begin(9600);
  for (int i = 0; i < 10; i++) {
    Serial1.println("Fill the Serial1 buffer");
  }

}
void loop() {
  Serial.println("Done");
  delay(1000);
}
ghost commented 6 years ago

Ok that is at least something that looks wrong.

Well it's been a couple days and I've been using SdFatSdioEX and Serial1 with no probems. Thank you for everything. (the library, and SSD1306AsciiWire too)

My project is porting this Tandy Portable Disk Drive emulator: https://github.com/TangentDelta/SD2TPDD to Teensy 3.6 to take advantage of the built-in sdcard reader and later also add functions to use the on-board rtc.

So the entire function of the device is nothing but real-rs232<>sdcard.

And it's working great so far. Thanks again.