openvehicles / Open-Vehicle-Monitoring-System-3

Open Vehicle Monitoring System - Version 3
http:///www.openvehicles.com/
Other
568 stars 220 forks source link

Config: file system corruption (losing data over reboot) #151

Open dexterbg opened 5 years ago

dexterbg commented 5 years ago

I unfortunately have overwritten the corrupted flash dump file, so I can only describe the effect:

On a new v3 module after first installation, everything seemed to be alright. After some reboots (also unclean by power losses), the vehicle specific config ("xrt" in my case) stopped keeping changes over reboots. Other config files were OK.

I dumped the flash content and mounted it on my Linux system, had no problems accessing it and modifying all files including "xrt". Did an fsck (just said unclean unmount, fixed) and reflashed to the module: nothing had changed, the config still could not be written.

Using a build with config security disabled I also could write to the file using the vfs command. I could create new files and write to them and other existing files.

Changing the name or deleting the file was possible, the file was recreated on changing the config, but remained empty.

Only a factory reset could fix the issue.

mjuhanne commented 4 years ago

I've seen this behaviour too few days ago. Modifying newly created config file ("power") resulted in error: RewriteConfig: error writing power : socket not open for writing or similar error message (I don't have the logs anymore). vfs stat showed existing but empty (0 bytes) file. First I thought it was my new code, but this happened also when trying to create new locations (I hadn't defined any before). Modifying existing config files was a success though. Rewriting the store partition fixed this. Any tips for investigating this if it happens again? Do you need raw partition dumps?

dexterbg commented 4 years ago

I can also confirm esp-idf 3.3 does not fix this issue, but it seems to occur very infrequently now.

@mjuhanne Your question should go to Espressif / esp-idf people. I guess they will like to get a log at verbose level (compiled, not set) on a write plus a raw dump of the defective partition.

glynhudson commented 3 years ago

I have installed a couple of OVMS modules in 40kWh leafs as GPS trackers only since OVMS does not yet support the 40kWh 2018 Nissan LEAF. I powered the module from a switched 12V feed so the module is only powered up when the car is being driven. This means the module gets lots of power cycles. After a month or two the module looses its config. This has been happening reliably to all the units I've installed in this way. It seems regularly power cycling the modules results in the config getting corrupted.

dexterbg commented 3 years ago

Users also still occasionally report this issue. Glyn, do you have file logging or any other standard file writes configured on these modules?

glynhudson commented 3 years ago

I'm afraid not. I've just switched on error logging to the SD card, hopefully I'll be able to get some useful output

dexterbg commented 3 years ago

sigh Once again: a file on /store suddenly became unreadable, or to be precise, it will crash on reading around 80% into the file. It's the Edimax module plugin script this time, which hasn't been touched since my last config restore about half a year ago.

The crash signature:

OVMS# vfs cat /store/scripts/lib/edimax.js
/**
 * Module plugin:
……………………
          state.error = "Unable to parse Edimax response";
        }
      } else {
        state.error = "Access error: " + resp.statusCode + " " + resp.statusText;
      }
      passertion "0 && "fatfs internal error"" failed: file "/home/balzer/esp/esp-idf/components/fatfs/src/vfs_fat.c", line 253, function: fresult_to_errno
abort() was called at PC 0x4010d194 on core 1

ELF file SHA256: 77ed93d9929cd0f1

Backtrace: 0x4008e4f2:0x3ffd7290 0x4008e78d:0x3ffd72b0 0x4010d194:0x3ffd72d0 0x4024a103:0x3ffd7300 0x4024a4f4:0x3ffd7320 0x401b103d:0x3ffd7350 0x4000bdbb:0x3ffd7370 0x4008bb51:0x3ffd7390 0x4008b942:0x3ffd73b0 0x4010da9e:0x3ffd73d0 0x4010db25:0x3ffd7410 0x400eccfa:0x3ffd7430 0x400fd83a:0x3ffd7650 0x400fd8ed:0x3ffd7680 0x400fd8ed:0x3ffd76b0 0x400fd94b:0x3ffd76e0 0x400ef0ab:0x3ffd7700 0x40103e9f:0x3ffd7720 0x40104056:0x3ffd7780 0x400ef0db:0x3ffd7820 0x4029f132:0x3ffd7840 0x40101977:0x3ffd7860 0x400f9989:0x3ffd7890 0x400f9b59:0x3ffd78c0 0x40101811:0x3ffd78e0 0x40101820:0x3ffd7900 0x400ec7f5:0x3ffd7920

Rebooting...

a2l says it's an unhandled FRESULT:

0x4024a103 is in fresult_to_errno (/home/balzer/esp/esp-idf/components/fatfs/src/vfs_fat.c:274).
269             case FR_NOT_ENOUGH_CORE: return ENOMEM;
270             case FR_TOO_MANY_OPEN_FILES: return ENFILE;
271             case FR_INVALID_PARAMETER: return EINVAL;
272             case FR_OK: return 0;
273         }
274         assert(0 && "unhandled FRESULT");
275         return ENOTSUP;
276     }
277     
278     static void file_cleanup(vfs_fat_ctx_t* ctx, int fd)
0x4024a4f4 is in vfs_fat_read (/home/balzer/esp/esp-idf/components/fatfs/src/vfs_fat.c:366).
361         FIL* file = &fat_ctx->files[fd];
362         unsigned read = 0;
363         FRESULT res = f_read(file, dst, size, &read);
364         if (res != FR_OK) {
365             ESP_LOGD(TAG, "%s: fresult=%d", __func__, res);
366             errno = fresult_to_errno(res);
367             if (read == 0) {
368                 return -1;
369             }
370         }
dexterbg commented 3 years ago

The downloaded flash image isn't recognized as a VFAT image at all, cannot be mounted via loop. I don't know how to repair it.

Also… I think a VFAT image should begin with meta data, shouldn't it? My image has JSON data (looks like the 12V history) in block 0:

# file repair.img 
repair.img: data

# mkdir f
# losetup /dev/loop0 repair.img
# mount -t vfat /dev/loop0 f
mount: … wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

# leela:/home/balzer/ovms/v3/flash/crash-210105-unhandled-fresult # xxd repair.img | head -20
00000000: 7b74 696d 653a 3135 3832 3236 3931 3431  {time:1582269141
00000010: 3030 302c 2276 2e62 2e31 3276 2e76 6f6c  000,"v.b.12v.vol
00000020: 7461 6765 223a 5b31 342e 3137 2c31 342e  tage":[14.17,14.
00000030: 3138 2c31 332e 3834 2c31 332e 3034 2c31  18,13.84,13.04,1
00000040: 322e 3837 2c31 322e 3634 2c31 322e 3531  2.87,12.64,12.51
00000050: 2c31 322e 3436 2c31 322e 3433 2c31 322e  ,12.46,12.43,12.
00000060: 3236 2c31 322e 3235 2c31 322e 3234 2c31  26,12.25,12.24,1
00000070: 322e 3231 2c31 322e 3232 2c31 322e 3231  2.21,12.22,12.21
00000080: 2c31 322e 3235 2c31 322e 3236 2c31 322e  ,12.25,12.26,12.
00000090: 3235 2c31 322e 3232 2c31 322e 322c 3132  25,12.22,12.2,12
000000a0: 2e31 392c 3132 2e31 372c 3132 2e31 342c  .19,12.17,12.14,
000000b0: 3132 2e31 332c 3132 2e30 372c 3132 2e31  12.13,12.07,12.1
000000c0: 2c31 322e 3039 2c31 322e 3039 2c31 322e  ,12.09,12.09,12.
000000d0: 3038 2c31 312e 3935 2c31 322e 3034 2c31  08,11.95,12.04,1
000000e0: 322e 3034 2c31 322e 3033 2c31 322e 3033  2.04,12.03,12.03
000000f0: 2c31 332e 3632 2c31 342e 3135 2c31 342e  ,13.62,14.15,14.
00000100: 3136 2c31 342e 3135 2c31 342e 3135 2c31  16,14.15,14.15,1
00000110: 332e 3533 2c31 332e 3136 2c31 332e 3032  3.53,13.16,13.02
00000120: 2c31 322e 3739 2c31 322e 3635 2c31 322e  ,12.79,12.65,12.
00000130: 352c 3132 2e34 342c 3132 2e33 392c 3132  5,12.44,12.39,12

But the module mounts this, the config is there, other files are there & readable, just reading that single file crashes the system.

Wireheadbe commented 1 month ago

Hi, I'm running an e-Niro with the famous 12V power draw issue, so as stated in other topics, I power OVMS from a switched 12V line. So naturally, it might end up powering down the OVMS module at the worst possible time.

So from time to time, I have to restore from Backup, and all seems fine again.. For some time.

However, since I only use the MQTTv3 implementation, is there a way to make the OVMS "not write anything to itself when powered on". Specifically: make its own FS "read-only", so this corruption doesn't happen? My metrics are stored "offsite" anyway.

That way, I wouldn't have to deal with this silent corruption anymore 🙂

dexterbg commented 1 month ago

The OVMS config framework does no writes unless necessary, i.e. by an actual config param value change. The Niro code does not use the config framework to store any data read from the car, so unless you do frequent config changes or have some plugin configured to write to "/store", there are normally no write operations to that partition.

Which is btw the case for most vehicles, an is the same regardless of the server type you use. Nevertheless the partition gets occasionally corrupted. IOW, mounting read-only is an interesting idea, but I doubt it will make a difference.

What actually did make a difference for me was updating my car module to the latest hardware revision (with the ESP32 rev3). I've only had to restore my config once since then, and that was after a failed firmware flashing. Are you still running the ESP32 rev1 version? If in doubt, check the module status window or the output of ota status nocheck.