khoih-prog / RP2040_RTC

This library enables you to use RTC from RP2040-based boards such as Nano_RP2040_Connect, RASPBERRY_PI_PICO. This RP2040-based RTC, using Interrupt, has no battery backup. Time will be lost when powered down. To need NTP-client to update RTC every start-up.
MIT License
23 stars 4 forks source link

RP2040_RTC_Time crashes Pico, does not work #3

Closed maxgerhardt closed 2 years ago

maxgerhardt commented 2 years ago

Describe the bug

Using the https://github.com/khoih-prog/RP2040_RTC/tree/main/examples/Time/RP2040_RTC_Time example on a Raspberry Pi Pico does not work.

Steps to Reproduce

Open the https://github.com/khoih-prog/RP2040_RTC/tree/main/examples/Time/RP2040_RTC_Time example, set the board to "Arduino Mbed OS RP2040 boards" -> Raspberry Pi Pico, compile and uplaod the sketch. Observe the serial monitor.

Expected behavior

The sketch sets the current RTC time as dictated by currTime = { 2021, 9, 30, 4, 4, 0, 0 }; and prints the current time on the serial monitior.

Actual behavior

Hangup after the sketch outputs

Start RP2040_RTC_Time on MBED RASPBERRY_PI_PICO
RP2040_RTC v1.0.6
Timezone_Generic v1.7.1

In fact when adding a few print statements in setup() as

  Serial.println("Creating new timezone.."); Serial.flush();
  myTZ = new Timezone(myDST, mySTD);

  // Start the RTC
  Serial.println("rtc_init().."); Serial.flush();
  rtc_init();

  Serial.println("rtc_set_datetime().."); Serial.flush();
  rtc_set_datetime(&currTime);

The serial monitor outputs

Start RP2040_RTC_Time on MBED RASPBERRY_PI_PICO
RP2040_RTC v1.0.6
Timezone_Generic v1.7.1
Creating new timezone..

hinting at that it did not survive the line myTZ = new Timezone(myDST, mySTD);.

After that, the on-board LED goes into a fast blinking pattern, meaning the board has landed in mbed_die() after a fatal error.

Debug and AT-command log (if applicable)

See above.

Screenshots

None.

Information

maxgerhardt commented 2 years ago

If I use the EarlePhilhower core instead of ArduinoCore-Mbed, I get much further

Start RP2040_RTC_Time on RASPBERRY_PI_PICO
RP2040_RTC v1.0.6
Timezone_Generic v1.7.1
Creating new timezone..
rtc_init()..
rtc_set_datetime()..
============================
04:00:01 Thu 30 Sep 2021 UTC
00:00:01 Thu 30 Sep 2021 EDT
============================
04:00:11 Thu 30 Sep 2021 UTC
00:00:11 Thu 30 Sep 2021 EDT
============================
04:00:21 Thu 30 Sep 2021 UTC
00:00:21 Thu 30 Sep 2021 EDT
============================
04:00:31 Thu 30 Sep 2021 UTC
00:00:31 Thu 30 Sep 2021 EDT
khoih-prog commented 2 years ago

Hi @maxgerhardt

Thanks for the bug report.

I've just tested using the same settings but still couldn't duplicate the crash.

I wonder if there is any difference in actual installation, bad boards, etc. Could you please check and verify

Selection_070

Start RP2040_RTC_Time on MBED RASPBERRY_PI_PICO
RP2040_RTC v1.0.6
Timezone_Generic v1.7.1
============================
04:00:01 Thu 30 Sep 2021 UTC
00:00:01 Thu 30 Sep 2021 EDT
============================
04:00:11 Thu 30 Sep 2021 UTC
00:00:11 Thu 30 Sep 2021 EDT
============================
04:00:21 Thu 30 Sep 2021 UTC
00:00:21 Thu 30 Sep 2021 EDT
============================
04:00:31 Thu 30 Sep 2021 UTC
00:00:31 Thu 30 Sep 2021 EDT
============================
04:00:41 Thu 30 Sep 2021 UTC
00:00:41 Thu 30 Sep 2021 EDT
============================
04:00:51 Thu 30 Sep 2021 UTC
00:00:51 Thu 30 Sep 2021 EDT
============================
04:01:01 Thu 30 Sep 2021 UTC
00:01:01 Thu 30 Sep 2021 EDT
============================
04:01:11 Thu 30 Sep 2021 UTC
00:01:11 Thu 30 Sep 2021 EDT
============================
04:01:21 Thu 30 Sep 2021 UTC
00:01:21 Thu 30 Sep 2021 EDT
============================
04:01:31 Thu 30 Sep 2021 UTC
00:01:31 Thu 30 Sep 2021 EDT
============================
04:01:41 Thu 30 Sep 2021 UTC
00:01:41 Thu 30 Sep 2021 EDT
============================
04:01:51 Thu 30 Sep 2021 UTC
00:01:51 Thu 30 Sep 2021 EDT
============================
04:02:01 Thu 30 Sep 2021 UTC
00:02:01 Thu 30 Sep 2021 EDT
============================
04:02:11 Thu 30 Sep 2021 UTC
00:02:11 Thu 30 Sep 2021 EDT
============================
04:02:21 Thu 30 Sep 2021 UTC
00:02:21 Thu 30 Sep 2021 EDT
============================
04:02:31 Thu 30 Sep 2021 UTC
00:02:31 Thu 30 Sep 2021 EDT
============================
04:02:41 Thu 30 Sep 2021 UTC
00:02:41 Thu 30 Sep 2021 EDT
============================
04:02:51 Thu 30 Sep 2021 UTC
00:02:51 Thu 30 Sep 2021 EDT
============================
04:03:01 Thu 30 Sep 2021 UTC
00:03:01 Thu 30 Sep 2021 EDT
============================
04:03:11 Thu 30 Sep 2021 UTC
00:03:11 Thu 30 Sep 2021 EDT
============================
04:03:21 Thu 30 Sep 2021 UTC
00:03:21 Thu 30 Sep 2021 EDT
============================
04:03:31 Thu 30 Sep 2021 UTC
00:03:31 Thu 30 Sep 2021 EDT
============================
04:03:41 Thu 30 Sep 2021 UTC
00:03:41 Thu 30 Sep 2021 EDT
============================
04:03:51 Thu 30 Sep 2021 UTC
00:03:51 Thu 30 Sep 2021 EDT
============================
04:04:01 Thu 30 Sep 2021 UTC
00:04:01 Thu 30 Sep 2021 EDT
============================
04:04:11 Thu 30 Sep 2021 UTC
00:04:11 Thu 30 Sep 2021 EDT
============================
04:04:21 Thu 30 Sep 2021 UTC
00:04:21 Thu 30 Sep 2021 EDT
============================
04:04:31 Thu 30 Sep 2021 UTC
00:04:31 Thu 30 Sep 2021 EDT
khoih-prog commented 2 years ago

I'm using

maxgerhardt commented 2 years ago

I'm updating to the same Arduino IDE version now to see if that makes a difference. We're already on the same core & library version it seems. If that does not help, I'll check in Ubuntu.

maxgerhardt commented 2 years ago

I have a clean install of Ubuntu 20.04 in a virtual machine, Arduino IDE 1.8.16, ArduinoCore-mbed 2.5.2 and all the needed latest libraries installed via the Arduino IDE library manager, and it still fails the same way as it did on Windows.

grafik

grafik

I cannot get this to work.

maxgerhardt commented 2 years ago

The Timezone_Generic code attempts to open a filesystem on the Pico. Do I have to use any special tool to initialize that filesystem?

khoih-prog commented 2 years ago

The Timezone_Generic code attempts to open a filesystem on the Pico. Do I have to use any special tool to initialize that filesystem?

Timezone_Generic relies on RP2040 LittleFS as mentioned in Currently Supported Storage

The library to use for mbed core is LittleFS_Mbed_RP2040

Still has some bug in the core for Nano_RP2040_Connect as in Important Notes and RP2040 Connect board has faulty components in newest purchase #318

But OK for RP2040

khoih-prog commented 2 years ago

That's the reason why no crash if using EarlePhilhower instead of ArduinoCore-Mbed core

maxgerhardt commented 2 years ago

Using the LittleFS_Test.ino sketch I just get


Start LittleFS_Test on RaspberryPi Pico
LittleFS_Mbed_RP2040 v1.0.2
[LFS] LittleFS size (KB) = 64
[LFS] LittleFS Mount Fail
[LFS] Formatting... 

followed by a crash (LED is blinking in a rapid pattern again, indicating it landed in mbed_die() after a crash).

So it looks like indeed the crash happens in the LittleFS logic.

My flash chip has the following markings on it

grafik

Which seem to read "1P033 00148S"(?). So it's definitely not an ILS chip that causes the known problem, the reference flash chip has a big "ILS" on it.

khoih-prog commented 2 years ago

This is really strange,

I just tested and OK with my board. I'm afraid there is some issue with your RP2040 with recent component shortage, quality + QC

Start LittleFS_Test on RaspberryPi Pico
LittleFS_Mbed_RP2040 v1.0.2
[LFS] LittleFS size (KB) = 64
[LFS] LittleFS Mount OK
====================================================
Writing file: /littlefs/hello1.txt => Open OK
* Writing OK
====================================================
Reading file: /littlefs/hello1.txt => Open OK
Hello from RaspberryPi Pico
====================================================
Appending file: /littlefs/hello1.txt => Open OK
* Appending OK
====================================================
Reading file: /littlefs/hello1.txt => Open OK
Hello from RaspberryPi Pico
Hello from RaspberryPi Pico
====================================================
Renaming file: /littlefs/hello1.txt to: /littlefs/hello2.txt => OK
====================================================
readCharsFromFile: /littlefs/hello2.txt => Open OK
Hello from RaspberryPi Pico
Hello from RaspberryPi Pico
====================================================
Deleting file: /littlefs/hello2.txt => OK
====================================================
Reading file: /littlefs/hello2.txt => Open Failed
====================================================
Testing file I/O with: /littlefs/hello1.txt => Open OK
- writing

16 Kbytes written in (ms) 236
====================================================
- reading

16 Kbytes read in (ms) 5
====================================================
Testing file I/O with: /littlefs/hello2.txt => Open OK
- writing

16 Kbytes written in (ms) 246
====================================================
- reading

16 Kbytes read in (ms) 6
====================================================
Deleting file: /littlefs/hello1.txt => OK
====================================================
Deleting file: /littlefs/hello2.txt => OK
====================================================

Test complete

Can you try LittleFS to verify the Flash is OK or some new bug of the core ???

  1. directly from the mbed core
  2. EarlePhilhower core
maxgerhardt commented 2 years ago

In fact per schematics a Pico should have a Winbond W25Q16JVUXIQ chip, and per datasheet page 73 the USON-8 package of that chip has a top-side marking of

grafik

With notes

USON has special top side marking due to size limitation yww = date code (year, work week); xxxx = lot ID

So if I re-read my "1P033 0Q148S" (misread a Q as a 0) this seems to make sense, year "0" might be 2020, week 33 (the chip came out in 2016 so 2010 makes no sense).

So I should have a standard Winbond chip.

khoih-prog commented 2 years ago

Did you FORCE_REFORMAT in LittleFS_Test.ino#L22 for the first time only

#define FORCE_REFORMAT          true
maxgerhardt commented 2 years ago

Nope, I left the standard settings be which is false. It then auto-detected that it needed to reformat.

khoih-prog commented 2 years ago

Possible true is necessary for the 1st try ??? Long time ago and I don't remember now if auto format is taking place. Just try true to see.

maxgerhardt commented 2 years ago

Interesting, with that it reformats it.. successfully..


Start LittleFS_Test on RaspberryPi Pico
LittleFS_Mbed_RP2040 v1.0.2
[LFS] LittleFS size (KB) = 64
[LFS] LittleFS Mount OK
====================================================
Writing file: /littlefs/hello1.txt => Open OK
* Writing OK
====================================================
Reading file: /littlefs/hello1.txt => Open OK
Hello from RaspberryPi Pico
maxgerhardt commented 2 years ago

I re-uploaded the RTC Time sketch after the LittleFS test was done, but it still fails in the exact same place and crashes sadly :(

There is a bug somewhere..

khoih-prog commented 2 years ago

OK, so we have to use true the first time as in

LittleFS_Mbed_RP2040.hpp#L22-L31

bool LittleFS_MBED::init()
{
  LFS_LOGERROR1("LittleFS size (KB) = ", RP2040_FS_SIZE_KB);

#if FORCE_REFORMAT
  fs.reformat(&bd);
#endif  

  return mount();
}

No auto-reformat was designed in then. Will have a look in the future.


Oh no, I'm wrong, auto format here

  if (!_mounted)
  {
    int err = fs.mount(&bd);

    LFS_LOGERROR(err ? "LittleFS Mount Fail" : "LittleFS Mount OK");

    if (err)
    {
      // Reformat if we can't mount the filesystem
      LFS_LOGERROR("Formatting... ");
      LFS_FLUSH();

      err = fs.reformat(&bd);
    }
maxgerhardt commented 2 years ago

After an initial LittleFS format of the Flash, it has to work without re-formatting it a second time. This cannot be. There must be a bug somewhere.

I've edited Timezone_Generic code in initStorage from

#elif TZ_USE_MBED_RP2040

      TZ_LOGDEBUG1("LittleFS size (KB) = ", RP2040_FS_SIZE_KB);
      int err = fs.mount(&bd);

      if (err)
      {       
        // Reformat if we can't mount the filesystem
        TZ_LOGERROR("LittleFS Mount Fail. Formatting... ");
        TZ_FLUSH;

        err = fs.reformat(&bd);
      }

to

#elif TZ_USE_MBED_RP2040

      TZ_LOGDEBUG1("LittleFS size (KB) = ", RP2040_FS_SIZE_KB);
      fs.reformat(&bd);
      int err = fs.mount(&bd);

      if (err)
      {       
        // Reformat if we can't mount the filesystem
        TZ_LOGERROR("LittleFS Mount Fail. Formatting... ");
        TZ_FLUSH;

        err = fs.reformat(&bd);
      }

But it still outputs the same thing. Even after adding significant amount of Serial.println() in the constructor etc, I can't seem to get more debug output out of it. Extremely strange, I'll look into it.

maxgerhardt commented 2 years ago

This is freaking absurd.

If I do an extremely minimal test in a new sketch

class MyTestObj {
private:
  int attr = 0;
public:
  MyTestObj(int attr) {
    Serial.println("Inside constructor!"); Serial.flush(); delay(100);
    this->attr = attr;
    Serial.println("Set attribute!"); Serial.flush(); delay(1000);
  }
};

void setup() {
  Serial.begin(115200);
  while (!Serial);

  delay(200);

  auto x = new MyTestObj(1);
  Serial.println("Back again!"); Serial.flush(); delay(1000);
  delete x;   
  Serial.println("Deleted!"); Serial.flush(); delay(1000);
}

void loop() {
  // put your main code here, to run repeatedly:

}

I cannot get past the line

    this->attr = attr;

it seems that the new operator allocates.. completely invalid memory?! And just crashes the freaking board!! I just get up to

Inside constructor!

If the new memory allocator is broken in the core, that is a massive problem.

maxgerhardt commented 2 years ago

Okay no that's wrong. I can't get past Serial.flush(). That immediately crashes. I added that in to make sure I see the output before it crashes, but in this case it seems to cause a crash. Retesting without..

maxgerhardt commented 2 years ago

That's it. After having formatted LittleFS once with the other test sketch, the Time sketch now works too.


Start RP2040_RTC_Time on MBED RASPBERRY_PI_PICO
RP2040_RTC v1.0.6
Timezone_Generic v1.7.1
Creating new timezone..
[TZ] LittleFS size (KB) =  64
[TZ] LittleFS Mount OK
rtc_init()..
rtc_set_datetime()..
============================
04:00:01 Thu 30 Sep 2021 UTC
00:00:01 Thu 30 Sep 2021 EDT
============================
04:00:11 Thu 30 Sep 2021 UTC
00:00:11 Thu 30 Sep 2021 EDT

But I think some improvement still has to be done there for the autoformat. If it crashes in ArduinoCore-mbed before the library can detect that mounting failed, a bug needs to be reported there.

khoih-prog commented 2 years ago

Still a bug somewhere.

Possibly something breaking just introduced from the new mbed core v2.4.1 / v2.5.2 as I fully tested the library months ago using latest core v2.3.1 then and didn't see any issue.

Check Prerequisites, but now I think that's the issue of Nano_RP2040_Connect

I'll retest LittleFS_Mbed_RP2040 Library using new cores if having time

khoih-prog commented 2 years ago

Try with several RP2040 boards, using v2.5.2, using EarlePhilhowerthen ArduinoCore-Mbed core, back-and-forth, increase / decrease flash size, FORCE_REFORMAT = true / false, still can't duplicate the issue.

It seems board-dependent or some kind of stray-pointers (in the mbed core ??) and I don't think I can solve this issue now without being able to duplicate.

So I'm closing it now until I get the correct bad board and/or can duplicate, or someone can make a PR after locating the bug of this library.

Thanks anyway,

maxgerhardt commented 2 years ago

It seems board-dependent or some kind of stray-pointers (in the mbed core ??) and I don't think I can solve this issue now without being able to duplicate.

No, I don't think it's that, it has to do with the filesystem.

I can 100% reproduce the problem by putting the Pico into bootloader mode again and dragging the flash_nuke.uf2 on it. Then the flash is all cleared (and LittleFS is thus also reset) to 0xff again. Reuploading the RP2040_RTC_Time sketch then always crashes with

Start RP2040_RTC_Time on MBED RASPBERRY_PI_PICO
RP2040_RTC v1.0.6
Timezone_Generic v1.7.1

and that is always reproducable.

The sketch won't work unless I do a forced reformat with the LittleFS sketch and reupload the original sketch. As soon as the flash is reset, the RTC_Time sketch crashes the board again.

khoih-prog commented 2 years ago

Thanks Max,

I'm just back and able to reproduce and fix the crashing issue by just deleting the line LittleFS_Mbed_RP2040.hpp#L45

LFS_FLUSH();

Thanks for your time spending here to help.

I'll check which is the root cause, and if this happens only to new cores v2.5.2 and v2.4.1, certainly if I have time to spend on.

I'll release a new version ASAP.

Best Regards,

khoih-prog commented 2 years ago

With the fix, LittleFS_Mbed_RP2040 is OK now, but still causing crash if using this RP2040_RTC library.

Will find out why and fix the Timezone_Generic library

khoih-prog commented 2 years ago

The LittleFS_Mbed_RP2040 releases v1.0.3 has just been published in both Arduino and PIO Library Manager


Releases v1.0.3

  1. Fix crashing issue for new flash. Check RP2040_RTC_Time crashes Pico, does not work #3
khoih-prog commented 2 years ago

Hi @maxgerhardt

It turns out that the same TZ_FLUSH at Timezone_Generic_Impl.h#L474 is causing crash. Will fix and release a new version.

TZ_FLUSH;

Really strange, but I don't have time to investigate what's new in cores v2.5.2 and v2.4.1 to cause the crash.

maxgerhardt commented 2 years ago

Yep, Serial.flush() instantly kills the processor, I fell in that trap above too. This needs to be reported in the core.

khoih-prog commented 2 years ago

Yep, Serial.flush() instantly kills the processor, I fell in that trap above too. This needs to be reported in the core.

Could you please help report the flush issue of the Arduino-mbed core.

Thanks.

maxgerhardt commented 2 years ago

Yes, writing the issue now.

maxgerhardt commented 2 years ago

We now depend on https://github.com/arduino/ArduinoCore-mbed/issues/357.

khoih-prog commented 2 years ago

Thanks.

At least now we can survive without the flush

khoih-prog commented 2 years ago

The underlying Timezone_Generic releases v1.7.2 has just been published, in both Arduino and PIO Library Manager, to fix this crashing issue.

Thanks for your valuable Contributions


Releases v1.7.2

  1. Fix crashing issue for new cleared flash. Check RP2040_RTC_Time crashes Pico, does not work #3
khoih-prog commented 2 years ago

Finally, the root cause has been fixed by Fix ::flush() on SerialUSB #341, which will be added to next release of the core.

You can use the file cores/arduino/Serial.cpp temporarily to replace ArduinoCore-mbed's cores/arduino/Serial.cpp if using the previous versions of libraries and can't wait.

Thanks for everybody,

Best,