tobozo / ESP32-targz

🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266
Other
123 stars 16 forks source link

task_wdt: Task watchdog got triggered while extracting a large tar.gz archive (~100mb) #75

Closed Sciguy429 closed 3 months ago

Sciguy429 commented 7 months ago

I have been using this library for both OTA updates and bulk content updates for files on an SD card. The OTA system has been working fine but my bulk content update has had constant issues. The ESP32 resets itself mid archive extraction (sometimes 10-15+ minutes in) without fail every time with a task watchdog error:

[1493540][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 329 bytes to CLOCKFACE/landscape7_largeDigital.JSON
[1493593][D][ESP32-targz-lib.cpp:753] tarEndCallBack(): Expanded /CONTENT/CLOCKFACE/landscape7_largeDigital.JSON (329 bytes)
[1493594][D][ESP32-targz-lib.cpp:764] tarEndCallBack(): Total expanded bytes: 329, heap free: 94508
[1493604][D][ESP32-targz-lib.cpp:697] tarHeaderCallBack(): Entering TRANSISTION/ directory.
[1493610][V][ESP32-targz-lib.cpp:772] tarEndCallBack(): [TAR INFO] tarEndCallBack: nofile for `TRANSISTION/`
[1493632][V][path_tools.h:118] mkdirp(): Creating /CONTENT/TRANSISTION folder for path /CONTENT/TRANSISTION
[1493740][V][ESP32-targz-lib.cpp:665] tarHeaderCallBack(): [TAR] Creating /CONTENT/TRANSISTION/Bowl_1.CAF
[1493762][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493767][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493779][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493791][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493804][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493815][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493828][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493840][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493854][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493865][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493880][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493891][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493902][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493914][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493927][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493941][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/Bowl_1.CAF
[1493952][V][ESP32-targz-lib.cpp:935] tarStreamWriteCallback(): [TAR INFO] tarStreamWriteCallback wrote 512 bytes to TRANSISTION/BoE (1502903) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (1502903) task_wdt:  - IDLE (CPU 0)
E (1502903) task_wdt: Tasks currently running:
E (1502903) task_wdt: CPU 0: AutoOTA Update
E (1502903) task_wdt: CPU 1: TinyHub Draw
E (1502903) task_wdt: Aborting.

abort() was called at PC 0x400f9cb0 on core 0

Backtrace: 0x4008395d:0x3ffbecbc |<-CORRUPTED

The system only has two tasks running, AutoOTA, which is running the archive extraction. And TinyHub, which is doing unrelated display management and is not touching the SD card in any way. The function responsible for extracting the archive looks like this, it is pulled almost exactly from the examples:

bool AutoOTA::extractArchive(const std::string &archivePath, std::string destFolder)
{
  //Clip path
  destFolder = clipChar(destFolder, '/');

  //Create unpacker object
  TarGzUnpacker *TARGZUnpacker = new TarGzUnpacker();

  //Configure unpacker
  TARGZUnpacker->haltOnError(true);                                                            //Stop on fail (manual restart/reset required)
  TARGZUnpacker->setTarVerify(true);                                                           //true = enables health checks but slows down the overall process
  TARGZUnpacker->setupFSCallbacks(targzTotalBytesFn, targzFreeBytesFn);                        //Prevent the partition from exploding, recommended
  TARGZUnpacker->setGzProgressCallback(BaseUnpacker::defaultProgressCallback);                 //targzNullProgressCallback or defaultProgressCallback
  TARGZUnpacker->setLoggerCallback(BaseUnpacker::targzPrintLoggerCallback);                    //gz log verbosity
  TARGZUnpacker->setTarProgressCallback(BaseUnpacker::defaultProgressCallback);                //Prints the untarring progress for each individual file
  TARGZUnpacker->setTarStatusProgressCallback(BaseUnpacker::defaultTarStatusProgressCallback); //Print the filenames as they're expanded
  TARGZUnpacker->setTarMessageCallback(BaseUnpacker::targzPrintLoggerCallback);                //tar log verbosity

  //Not using an intermediate file
  if (!TARGZUnpacker->tarGzExpander(tarGzFS, archivePath.c_str(), tarGzFS, destFolder.c_str(), nullptr))
  {
    //Print error
    Serial.printf("ERROR! Unable to extract archive, tarGzExpander+intermediate file failed with return code #%d\n", TARGZUnpacker->tarGzGetError());

    //Destroy unpacker object
    delete TARGZUnpacker;

    //Exit
    return false;
  }

  //Destroy unpacker object
  delete TARGZUnpacker;

  //Return success
  return true;
}

From cursory searches I know that this watchdog triggers when freertos gets starved for task time due to a task never yielding. Though I am unsure of how to proceed with debugging this. Obviously from my own code I can't exactly force the library to yield to freertos and because I am using the esp32-arudino libs from platformio changing the watchdog timeout isn't feasible either.

tobozo commented 7 months ago

hi,

E (1502903) task_wdt: CPU 0: AutoOTA Update
E (1502903) task_wdt: CPU 1: TinyHub Draw

do SD and display share the same spi bus? if so both tasks should be on core 1

unless both display driver and sd driver are thread safe and use semaphores, you shouldn't update the display from a a separate task

e.g. if you need to update a progress bar or a status text on the display, do it with a genericProgressCallback and a genericLoggerCallback:

void myProgressCallback( uint8_t progress )
{
  drawProgressBar( progress ): // progress value is between 0 and 100
}

void myStatusCallback( const char* format, ... )
{
  char tmp_path[256] = {0};
  va_list argptr;
  va_start(argptr, format);
  vsnprintf(tmp_path, 255, format, argptr);
  va_end(argptr);
  drawTarFile( tmp_path  ); // string max 255 chars, file or folder name
}
  TARGZUnpacker->setGzProgressCallback( myProgressCallback );

  TARGZUnpacker->setTarStatusProgressCallback( myStatusCallback );
  // works with any printf based function
  // TARGZUnpacker->setTarStatusProgressCallback( tft.printf );
  // TARGZUnpacker->setTarStatusProgressCallback( log_e );

finally if you're 100% sure the tar.gz file isn't corrupted and complete, you can set this to false :

  TARGZUnpacker->setTarVerify(false); // false = less i/o during unpacking
Sciguy429 commented 7 months ago

I am still working on this as of now, I am beginning to suspect my file is ending up corrupted prior to ESP32-targz getting a hold of it. I was able to get the process to not trip the WDT by disabling it, but then I started getting read errors for the file along with Targzunpacker exiting with an error.

TinyHub dose not use SPI, it is a hub777 rgb matrix driver that effectively bit bangs the interface. It dose a lot of low level IO, but no SPI calls, pretty much only digitalwrite(). As of now it dose not display any progress, though that was a planned thing.

For now I think I need to take a closer look at my download function and do some tests with it. CRC's are matching fine, but those are calculated as it downloads, not from the final saved file.

I know it's not strictly related to this library, but here is the download function in question:

bool AutoOTA::downloadFile(const std::string &url, const std::string &path, uint32_t &crc32Value, bool overwrite)
{
  //Print header
  Serial.println("AutoOTA: Starting file download:");
  Serial.print(" - URL: ");
  Serial.println(url.c_str());
  Serial.print(" - Local path: ");
  Serial.println(path.c_str());

  //Check if file already exists
  if (SD.exists(path.c_str()))
  {
    //Make sure it is not a directory
    File f = SD.open(path.c_str());
    if (f.isDirectory())
    {
      Serial.println("ERROR! Unable to download file, provided path is a directory!");
      f.close();
      return false;
    }
    f.close();

    //Check for overwrite
    if (overwrite)
    {
      Serial.println("WARNING! downloadFile() overwriting provided path!");
      if (!SD.remove(path.c_str()))
      {
        Serial.println("ERROR! Unable to download file, could not overwrite existing file!");
        return false;
      }
    }
    else
    {
      Serial.println("ERROR! Unable to download file, path already exists and overwrite is disabled!");
      return false;
    }
  }

  //Open file
  File outFile = SD.open(path.c_str(), FILE_WRITE, true);
  if (!outFile)
  {
    Serial.println("ERROR! Unable to download file, unable to open destination file!");
    return false;
  }

  //Setup HTTP request
  HTTPClient http;
  http.setFollowRedirects(HTTPC_STRICT_FOLLOW_REDIRECTS);
  http.begin(url.c_str());

  //Send request
  int httpCode = http.GET();
  //Make sure the request was good
  if (httpCode == HTTP_CODE_OK)
  {
    //Get response length
    int len = http.getSize();
    int fullLen = len;

    //Buffer
    uint8_t buff[OTA_DOWNLOAD_BUFF_SIZE];

    //Get TCP stream
    WiFiClient *stream = http.getStreamPtr();

    //Read all data from server
    CRC32 crcObj;
    uint16_t lastProgress = 0;
    while (http.connected() && (len > 0 || len == -1))
    {
      //Get available data size
      size_t size = stream->available();

      if (size)
      {
        //Read up to our buff size
        int readBytes = stream->readBytes(buff, ((size > OTA_DOWNLOAD_BUFF_SIZE) ? OTA_DOWNLOAD_BUFF_SIZE : size));

        //Write the data to our file
        outFile.write(buff, readBytes);
        if (len > 0)
        {
          len -= readBytes;
        }

        //Update the CRC32
        crcObj.add(buff, readBytes);

        //Calculate and print our progress
        uint32_t bytesRead = fullLen - len;
        uint16_t progress = ((float) bytesRead / fullLen) * 100;
        if (progress != lastProgress)
        {
          Serial.printf(" - AutoOTA: Download progress: %u%% (%u/%u)\r", progress, bytesRead, fullLen);
          lastProgress = progress;
        }
      }

      //Yield to make FreeRTOS happy
      // - This will also give the WIFI tasks a chance to run
      delay(1);
    }

    //Close the file
    outFile.close();

    //Calculate the CRC32
    crc32Value = crcObj.calc();
  }
  else
  {
    //Get failed
    Serial.println("ERROR! Unable to download file, GET failed!");
    Serial.print(" - Response code: ");
    Serial.println(httpCode);

    //Close file and http
    outFile.close();
    http.end();

    //Return fail
    return false;
  }

  //Close http
  http.end();

  //Download completed
  Serial.println();
  Serial.println(" - Download complete!");

  //Return success
  return true;
}
tobozo commented 7 months ago

what's the value for OTA_DOWNLOAD_BUFF_SIZE?

I started getting read errors for the file along with Targzunpacker

if source and destination medium are the same then you really want to minimize I/O

if your network connection is solid you may as well use tarGzStreamExpander directly with http.getStreamPtr() and unpack the targz on the fly; this would spare a buffer, use no tar intermediate file and reduce I/Os (no reads during decompression):

  HTTPClient http;
  http.begin(url.c_str());
  if (http.GET() == HTTP_CODE_OK) {
    TARGZUnpacker->setTarVerify(false); 
    auto res = TARGZUnpacker->tarGzStreamExpander( http.getStreamPtr(), &SD, "/outputdir", http.getSize() );
  }
Sciguy429 commented 7 months ago

OTA_DOWNLOAD_BUFF_SIZE is set to 1kb (1024) and the ESP32 has plenty of free heap in this configuration.

Also I had no idea I could just extract the archive right from the stream, that might work better, though I wouldn't be easily able to check the file CRC.

tobozo commented 7 months ago

OTA_DOWNLOAD_BUFF_SIZE should be set to at least 4096 to prevent timeouts and also to optimize writes on the SD

indeed you wont't be able to CRC-check the targz file if using tarGzStreamExpander

Sciguy429 commented 7 months ago

Well I am not even more confused...

I upped that buffer size without issue, doesn't seem to have made much of a speed difference but it is hard to tell. However with some test code I was able to confirm that my file is not downloading right. The first 114688 bytes of the file are being saved to the card but nothing after that. The CRC check still succeeds though, as it is fed right out of the buffer along side the file write operation.

I have zero clue why the file isn't properly writing, but it at least seems that my issue here isn't the fault of the library. My archive is 100% to blame...

Sciguy429 commented 7 months ago

Alright I wanted to give a final update on this one, mostly for anyone finding it on google at some point in the future. This issue I am having is NOT caused by the ESP32-Targz library. It seems that this idle task timeout simply triggers any time that some sort of SD card read/write error occurs. The WDT triggering will obfuscate any of these error messages though, which is quite annoying.

You can disable the WDC with the following code to see these errors properly, though I wouldn't recommend using it outright in your project as an idle timeout shouldn't be happening at all:

//Disable WDT
esp_task_wdt_init(10, false);

//Enable WDT
esp_task_wdt_init(10, true);

The real pain I am discovering at this point is that the ESP32 SD library is not providing any error messages or information as to why these write errors are occurring. Calls to file.write() simply return zero with no further errors or information displayed in the serial output. The file can be closed without issue and even read back, but no more writes are accepted. Flushes do nothing and repeated write attempts all result in the same 'zero' output and no further file writes. As a side, I did try the http stream method you provided an example for and it too eventually errored out with a write error that seemed to be displaying similar symptoms.

This project I am working on uses the SD card to store fairly raw video data and it is perfectly capable of reading that data for hours on end from the SD card. It seems that the ESP32 SD library is simply unstable when writing. Either that or something about my hardware config refuses to work with writes despite having no issues with near constant reads.

Either way, I think this issue can be closed. My problem here clearly lies with the ESP32 Libs themselves and not your library. Apologies for sending you on this wild goose chase, but I suppose the knowledge that a read/write error can cause the ESP32 WDT to trigger is useful for some debugging cases.