sieren / Homepoint

Espressif ESP32 Based Smarthome screen for MQTT
MIT License
615 stars 86 forks source link

SPIFFS randomly corrupted #87

Open sieren opened 4 years ago

sieren commented 4 years ago

Occasionally SPIFFS gets corrupted in failsafe (or otherwise), resulting in files unable to be edited or created (deleting however always works). Often files are only "half" written during save and afterwards cannot be overwritten properly.

Currently no steps to reproduce, seems random. Only "fix" is to reflash the bin partition.

jonathanmbradshaw commented 3 years ago

I have seen this (or something related). After a number of updates to the config file using the HTTP UI I get to the point where the it can not be accessed for Read or Write. Only option is to re flash the M5 using the USB port.

AstroTyrannus commented 3 years ago

Moin Sieren,

Schonmal in Betracht gezogen das SPIFFS selber der Verursacher sein kรถnnte? Versuch mal SPIFFS durch LittleFS zu ersetzen.

htvekov commented 3 years ago

Hi' @sieren

Issue with occasional corrupt SPIFFS still persists I'm afraid. I'm in a test phase atm. ๐Ÿ˜and edit and writes numerous revisions to config file these days. I've experienced for the third time now (both on HP v0.05 M5stack and ESP32_generic files) that SPIFFS after many writes (min. 20-30 or so) suddenly corrupts.

Issue can't be resolved with an OTA update but can only be recovered with a complete serial reflash.

Ciao !

sieren commented 3 years ago

Thanks for checking, but yeah, sadly no news on that front. I'm afraid it's in the SPIFFS implementation itself and unrelated to Homepoint.

htvekov commented 3 years ago

Hi' @sieren

There's definitely an issue with SPIFFS in HomePoint. Either with SPIFFS itself or the configuration.

Tried to upload my icon pack via web interface and it dies uploading file no. 12. Icons files are only some max. 2kb incl. overhead, so partition is filled after some 22 kb upload only. I've done this three times repeatedly with exactly the same steps, stops every time at exactly the same file.

So I tend more to believe that it's some sort of partition size issue, rather than an issue with repeated writes to eg. config.json

Has released binary files by accident been compiled with some strange ultra, minimum SPIFFS partition scheme ? I would wish I could be more specific, but I'm afraid my knowledge on this specific subject is very close to zero ๐Ÿ˜’

cerietke commented 3 years ago

Just wanted to confirm I see similar behaviour on an M5Stack Core. Have had file uploads be refused, have had config not being written and have had config getting corrupted at different times. I sometimes see a 500 error with no descriptive text. My suspicion was also partition size, but not that alone as after reflash using USB and uploading the exact same files and writing the same config it will succeed. I was thinking it might be storing some version history, hence it running out after doing a bunch of things, not immediately.

sieren commented 3 years ago

LittleFS will make it into ESP-IDF v4.2, but sadly the underlying Arduino Library doesn't support it yet. I will investigate replacing SPIFFS once that happened.

htvekov commented 3 years ago

Hi' Matt.

Yes, that would really make a huge improvement to replace SPIFFS. It's hard 'tinkering' with HomePoint with current SPIFFS issue as frequent flashing is needed. But the good thing is that HomePoint already is absolutely 'rock solid' if left alone

Ciao !

ghosty-be commented 3 years ago

I just encountered this too after messing with some settings that I had a corrupted config.json it seems... I have (for another project) actually https://randomnerdtutorials.com/install-esp32-filesystem-uploader-arduino-ide/ installed in my arduino IDE... was wondering if it would be possible to just rewrite the spiffs like that... I tried in arduino IDE to just create an empty project, copied the data folder from your github and (luckily) a local copy of my config.json pasted in there... but it seems that it does not want to upload the spiffs like that by choosing esp32 sketch data upload ... Here are my settings... (just trying something random as I have no prior experience with spiffs... ) arduino-ide In the end I reflashed the md5stack core esp32 with the bin image and started over with the access point mode config to copy-paste my previous config.json

sieren commented 3 years ago

It should be possible...at least when using the raw project you can update the spiffs bin with make spiffs_spiffs_bin and flash it manually with something along the lines of esptool.py --port /dev/<serial_port_goes_here> --chip esp32 -b 921600 write_flash --flash_mode dio --flash_freq 80m --flash_size 4MB 0x2b0000 spiffs.bin (I'd assume)

Your Arduino project needs to know where to upload the data. That's defined in the partition scheme (partition.csv) in the root of this project - the address is 0x2b0000

ghosty-be commented 3 years ago

reading up on this it seems that it's not that straight forward to have that partition layout handed to arduino IDE... I guess it requires creating a new board definition specifically with your partitions.csv layout ... darn why can't stuff be simpler :) https://robotzero.one/arduino-ide-partitions/

ghosty-be commented 3 years ago

ok messing around with mkspiffs ... I saw in your partitions.csv the spiffs is 500kB ? Tried something like mkspiffs -c ./data -s 512000 spiffs.bin but that says ... /captive/app.js SPIFFS_write error(-10001): File system is full.

error adding file! Error for adding content from captive! /power_inactive.jpg SPIFFS_write error(-10001): File system is full.

error adding file!

While it should fit according to this logic: $ du -bs spiffs.bin 512000 spiffs.bin $ du -bs data/ 432687 data/

so not sure what I am doing wrong there... googles on

ghosty-be commented 3 years ago

so after a bit more searching: spiffs by default use 4k block ... du -B4K data/ 130 when I multiply that by 4 it's 520k :/ So you use a different block size for your spiffs somehow? :)

sieren commented 3 years ago

oh you may be onto something.. maybe this is the reason for the corruptions all along??

ghosty-be commented 3 years ago

just wondering how you got all the data there in the end :/ As I assume that your github /data should be a reflection of the data onto the spiffs ... Was also looking if it was possible to like download the whole spiff space to do a compare or stuff... but that doesn't appear to be possible... What I did notice after the corruption yesterday might be then caused by the same issue: I deleted through the webinterface the config.json and tried to upload my backup copy but that always failed ... So might go in and delete some images I don't really use and see if I can do a similar operation successful: deleting and re-uploading the config.json after... (but that'll be for this evening when I have more time to tinker with it )

sieren commented 3 years ago

This is really great info! We might be onto something here. You are right, even spiffs.bin is beyond 500kb by now. It's weird that none of the ESP tools were complaining about the fact that the bin-file is beyond what's specified in the partition scheme. This would make perfect sense though.

It seems I need to create per-device SPIFFS Partition Schemes too, some ESP32 devices only support 4MB, others like the M5Stack are made for up to 16MB. But it looks like there was still some space to bump up the SPIFFS partition.

I've attached a special build that remedies this. It'd be great if you could test it and report back. ping @htvekov maybe you too, since the icon pack might be hammering the SPIFFS partition pretty well.

Mind you this requires a hard flash through USB with the _full.bin - dont forget to make a backup of your config before :)

homepoint_release.zip

ghosty-be commented 3 years ago

Did not really find how to trigger it yet still trying with the current release v0.07.2 tried deleting a bunch of images and uploading the backup config.json ok reflashed tried deleting config.json, uploading the backup config.json ok tried adding a couple lines to config.json, reloaded and rebooted a couple times ok it started to act weird when I after that uploaded a readme.txt, then uploaded another readme.txt (other content) all still only a couple bytes... shortly after that the webinterface started to act up (no files visible anymore in the webinterface... but the reload and restart still worked... ) after reflash again tried to upload some files, even a specially crafted blob.abc (just urandom data with dd) which was over 4kB in size to take up 2x4kB blocks ... but still couldn't corrupt the spiffs...

Now just flashed your above release... it works but so not sure how to verify that it actually fixed anything :) I also uploaded a couple of icons from @htvekov to test but so far it still work... So what is the change now in the release? just the size of the spiffs is larger and how large? Would like to continue actually my quest into rewriting the spiffs without re-flashing the actual code ... tinkering all the way

htvekov commented 3 years ago

Hi' Sieren.

Sorry about the extremely late reply. I've been quite busy at work last 14 days. Just tested and loaded some 30+ icons, all working!! ๐Ÿ‘Œ๐Ÿ˜๐ŸŽ‰

I'll leave it active for a few days and see if i can crash this Home Point version abusing spiffs ๐Ÿ˜‰

Ciao !

ghosty-be commented 3 years ago

running it for a solid week now without problems... but then again since I didn't mess with it much I can't really say if its now gone :)
I could not make it crash before on purpose...

sieren commented 3 years ago

Cool, sounds promising so far. Maybe once you had a chance to mess with it a bit more, let me know. Otherwise I'll roll this changes into the next update

sieren commented 3 years ago

Leaving open for now

htvekov commented 3 years ago

Can't kill it, Matt !๐Ÿ˜‰

Has been running stable and without any issues - loading files or otherwise. I would merge fix and release as stable build.

Ciao !

ghosty-be commented 3 years ago

I today played a lot with the config, copy - pasting other configs and reloading... that's what trashed mine before, but this version has been running fine for 18 days now...

dresende commented 3 years ago

I was just bitten by this, and since wifi is on the config, it's "bricked", have to reflash. I would suggest 2 features to solve this:

  1. Fallback to no config after perhaps 5min? 10min of not being able to connect?
  2. Have an option to reformat SPIFFS.
sieren commented 3 years ago

Using the version attached to this issue a few comments back? The fix hasn't been rolled into a main release yet

dresende commented 3 years ago

No, I used the latest version, didn't saw that attachment. Will try that and see how it goes.

cerietke commented 3 years ago

Is the file attached to the Feb 24th post the latest? My core 2 seems unstable (though no space problems so far), it keeps restarting and has trouble connecting to the wifi, once it has connected it sometimes reboots or turns off on a click.

dresende commented 3 years ago

I flashed 2 hours ago and so far so good ๐Ÿ˜„

sieren commented 3 years ago

@cerietke see https://github.com/sieren/Homepoint/issues/145 - known issue right now

ghosty-be commented 3 years ago

so far been messing with it quite a bit ... not seen the spiffs corruption issue with the version attached to this thread... (did not get it corrupted despite rewriting a bunch of times the configs, adding a couple of extra icons etc...) However reading above about the crashes on m5stack core 2... (and I went to look in that thread... but I have a 1st generation core ...I have seen some similar behavior on my m5stack core (but not that often) I have it attached to usb power adapter on my desk (I removed the battery shield) and have seen sometimes that it like reloads showing the wifi is disconnected for a couple of seconds before it reconnects... (lets say once a week or even less frequent) Not sure if it could be some kind of condition where the connection to the broker or wifi is lost and that causes homepoint to freak out and reload or something? :/

iqbalibrahim1992 commented 2 years ago

Hi @sieren absolutely love this, I got it all running without any issues at first, but I've noticed that I experience this problem (cannot edit/create files) after I upload a new icon file.

I've created two JPG files both 50px x 50px called office_active.jpg and office_inactive.jpg - as soon as they upload, I'm not able to edit the config.json file anymore. Using the M5Stack Basic Development Kit. What I'm having to do is build the config.json file in Notepad++, flash the M5Stack via USB, then edit the config.json in the Web UI for it to work, and then basically not touching anything in the Web UI.

Any advance on how to resolve this issue? Happy to try and help with it although my knowledge is probably nowhere near yours on this... Thank you

cerietke commented 2 years ago

Are you sure it's always the case that you can't edit it?

I have a number of images I created. I upload them one after another after a fresh install, then edit my config (copy-paste) and then it's usually fine. If I have to edit the config again later I sometimes that it loads partially and I can't seem to save it again. If that happens I reset: do a fresh install and start from scratch.

I believe the new code is very different, I tried it on my core2, but unfortunately I couldn't get creating my own images to work. I did not see the same issue with the config.json though.

gon0 commented 2 years ago

Maybe it helps reproducing this issue: on my M5 Stack, I entered a german letter, "รŸ" in the config.json-file as a name, e.g.

    "name": "GieรŸen",
    "type": "Sensor",
    "icon": "door",

After I saw the message "Configuration invalid! Login via browser Could not Parse config file", it was not possible to edit the config-file. After some minutes, it was completely empty. I have reflashed the M5 Stack to get back to a working system again.

jonathanmbradshaw commented 2 years ago

Its a long shot.. might this be related to NTP & DHCP options? I dont have the serial log at this point, but I had a lot of NTP retry 1 - 10 errors, and also a set time error on the file system while updating config.json