nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.64k stars 3.12k forks source link

file_list enhancement #2440

Closed nedoskiv closed 6 years ago

nedoskiv commented 6 years ago

Hello I'm not familiar with github, but love nodemcu lua project, use it to make an RFID system, but that is out of the topic, point is that I need to handle a list of large number of files. There is nothing wrong with it unless you want to see list of all files. file_list function fails with out of memory error when file numbers goes above 280 (depends of file name length). So first I asked someone to make possible to get the list in other way, guess not posted it right. So I believe made it myself. Here is little modification of file_list function I made:

// Ven version of list()
static int file_list( lua_State* L )
{
  unsigned st = luaL_optinteger( L, 1, 1 );     // start offset
  unsigned tf = luaL_optinteger( L, 2, 100000 );    // how much files to list
  tf=tf+st;
  vfs_dir  *dir;

  if (dir = vfs_opendir("")) {
    lua_newtable( L );
    struct vfs_stat stat;
    int i=1;
    while (vfs_readdir(dir, &stat) == VFS_RES_OK) {
        if (i<st)
            {
            i++;
            continue;
            }
        if (i>=tf)
            {
            break;
            }
      lua_pushinteger(L, stat.size);
      lua_setfield(L, -2, stat.name);
      i++;
    }
    vfs_closedir(dir);
    return 1;
  }
  return 0;
}

now it work as usual when called without arguments (list files from 1 to 100000), but if the function is called with an arguments - file_list(st,n) it gonna return N number of files, starting from st. That way i can use it to retrieve file list on portion and avoid out of memory errors&restart

I do not know how to post that for integration nodemcu, Also my C knowledge is limited, may be someone can do it better. I do it my way, test it and it is working fine for me.

At the end, please someone answer to me - how many number of files SPIFFS nodemcu file system can handle?

TerryE commented 6 years ago

Hi, first some simple answers.

how many ... SPIFFS ...

The github repo is here; there is a FAQ and click on the wiki link for other SPIFFS documentation including a FAQ.

Large keyed arrays require a large hashtable within the Table in RAM and this will exhaust RAM, hence your error. One way to avoid this would be add a true Lua iterator (see Pil 7.4). Another way which would offer backwards compatibility would be to allow an optional Lua regexp argument so that only the subset of entries that matching that regexp are returned. Absolute index range is a bad idea since the order of returns from a SPIFFS readdir is indetermined.

nedoskiv commented 6 years ago

so far i do not see SPIFFS readdir to be indetermined, if that fails I gonna try to make function that write file_list to a file. That gonna work for me too. As I mentioned above my C programing skills are not good also i got no idea about whole build concept of nodemcu. I just examine few files and come out with that solution for now.

TerryE commented 6 years ago

@nedoskiv as I said, read the FAQ, and specifically Will file list order be preserved?:

No. When you list files on your file system, the order is arbitrary. Also, due to garbage collection, it may be rescrambled as soon as you make any changes to your file system. If ordering of files are important it is up to the user to take care of this; e.g. by a sequence number in the file name.

I will look at adding an optional regexp parameter to file.list() as this is backwards compatible and looping over a file list with a string.match() filter is a very common use case, but doing this is a low priority for me at the moment. Sorry.

I was brooding about your application. Have you considered a pure Lua approach? I assume that you want to store one RFID tag per file, and hence the large number of files. Another approach would be to hash the RFID tag onto a small index, say 0 - 255 and use 256 files each with an array of all of the RFID tags with that hash stored as a JSON array. You retrieve a record by doing a hash, read the file and sjson.decode() to convert to Lua array form (remember to set the {depth=n} option as this limits the RAM used by decode) and access the entry that you want.

If you need to update then update the array re-encode and write back to the file. At worst you will be doing one file Read-Write per RFID read. You can play with the trade-offs between the hash range and file size.

nedoskiv commented 6 years ago

Well thank you for your time. I know how to do it in lua, problem is that it slow down the device when responding to RFID request, fastest way is to try to open filename coresponding to RFID. It works well, I just had that problem and manage to fix it my way. I do not seek help into this "issue", just showing what I have done if someone want to implement it into file_list function.

From my testing so far far return of "vfs_readdir" function is always in same order, no duplicates or missing names into file list. Checked with 550 files retrieving them in portions of 100.

nedoskiv commented 6 years ago

Few additional notes from my experience with that enhancement so far: if you do not modify file system while call this function, file order returned is always the same, but if you remove, write&close file during different file_list calls, it changes order. The way i use it and it remain always is same order is that:

file.open (example, write&append)
file_list(1,100)
file.write(example)
file_list(100,100)
file.write(example)
file_list(200,100)
file.write(example)
file_close(example)
...

that way I can write some data to file, and it returning order was correct.

TerryE commented 6 years ago

If you look at the SPIFFS code then you will see that SPIFFS GC can be triggered during any FS update, and if this occurs the list list order can change. Can doesn't man always, so your code might work 99% of the time and fail on only 1% but when reorder does occur then your algorithm will return the wrong results.

nedoskiv commented 6 years ago

Ok thank you, I will focus to make file.list function when called with argument file_list(filename) to write file list in a file, one filename per line, that way gonna be useful too. Just need to find some examples how to write file here, if you can point me gonna be cool.

HHHartmann commented 6 years ago

@TerryE Terry, does that mean that even the spiffs iterator itself will return inconsistent results if other spiffs functions are called inbetwen? If so I could imagine a method like sketched by @nedoskiv which also returns two hashes, one representing all files up to the one before this call and one hash over all filenames up to this call. E.g.: if I request the files from 100 to 199 I get the hash of files 0 to 99 and the hash of 0 to 199. In the next call requesting files 200 to 299 I get the hash for 0 to 199 and 0 to 299. If the hashes of the first and second call match I know the returned filenames are consistent.

It's sort of a workaround but maybe someone else has a better idea.

@nedoskiv writing to a file might still change the order of files I guess.

TerryE commented 6 years ago

@HHHartmann Gregor, I have had a look at the SPIFFS code, but I largely go on Peter's SPIFFS documentation and for this on his SPIFFS FAQ and to requote the para on this:

When you list files on your file system, the order is arbitrary. Also, due to garbage collection, it may be rescrambled as soon as you make any changes to your file system. If ordering of files are important it is up to the user to take care of this; e.g. by a sequence number in the file name.

So any write to SPIFFS can trigger GC which might result in a different return order for a filelist return.

As I said above, the easiest extension is to add an optional Lua search pattern to the files.list() call.

HHHartmann commented 6 years ago

@TerryE Terry, thats what I would have expected. But if your filenames do not have any structure like when they resemble MAC addresses ore some other kind of unique IDs then a search pattern does not help much to select parts of them. Even when checking for a, b and so on might clash because the names are not distributed even enough.

nedoskiv commented 6 years ago

question is if I place my file_write here:

while (vfs_readdir(dir, &stat) == VFS_RES_OK) { // lua_pushinteger(L, stat.size); File_write here! // lua_setfield(L, -2, stat.name); }

can still it mix order of file_list? guess need to test that.

TerryE commented 6 years ago

if your filenames do not have any structure

But the application programmer decides file names. Using a FS as a keyed database is dodgy under the best of circumstances. Using SPIFFS doubly so. In the case of the OPs usecase, either the RFIDs are preset and being downloaded in which case the programmer has to develop a suitable scalable solution and essentially use SPIFFS as a (largely) R/O resources, or it is RFID collection in which case the best thing to do is to stream them over the WAN.

From the perspective of the firmware development is has a sweet-spot. We should only put resources into extending the API when there is clear re-usability. Of course the system is Open Source so individual developers are free to do their own custom builds.

TerryE commented 6 years ago

can still it mix order of file_list?

Yes. You are updating the FS between readdir() calls.

TerryE commented 6 years ago

@nedoskiv. This isn't a bug, but more an architectural constraint of using SPIFFS. Any change that we make would need to be upwards compatible and work under all usecases. You are free to change your own build, but we won't be integrating this into dev.

nedoskiv commented 6 years ago

I never said "bug". Guess that is how things are designed and work. I already made a version where RFID are streamed over the WAN and some other device decide will access be provided or not. This is for standalone RFID controller. It have around 25 system files and works fine with default file_list function up for 250 RFID records. In some case scenarios they are not enough. Guess I will do some testing and get out that file list 2,3 times and check it with CRC or at least drop file size from returned table. When there is a goal, solutions always can be found :)

HHHartmann commented 6 years ago

I still would like to be able to list a larger number of files. Any general code listing all files like ftp or esplorer or any other tool trying to inspect the contents of spiffs will have that problem. Maybe an iterator would be possible to implement or even the regexp solution (although that might be the hardest to develop). I am not saying that you, @TerryE should do it but I would have a look and I am sure @nedoskiv would also.

TerryE commented 6 years ago

I still would like to be able to list a larger number of files. You could do this with files function that returns a standard Lua Iterator (like pairs()), but this would only work if you send the results over the WAN or pack them in RAM. If you write them back to a file then you are changing the FS that you are running readdir over and SPIFFS doesn't guarantee the results here.

The scaling issue here is that file.list() returns an array, and the Lua algo for storing associative arrays is very memory intensive, and we run out of RAM.

HHHartmann commented 6 years ago

I don't quite understand your point. file.list() returns an array, yes. So there could be a factory call to initiate an iterator which then reads the next file each time it is called. Would that be possible or am I missing something?

TerryE commented 6 years ago

See the lbaselib.c code for pairs() and PiL 27.3.3 -- it's worth buying the PiL book BTW as I discuss in my FAQ. In this case the iterator is stateless so it is even easier.

nedoskiv commented 6 years ago

@HHHartmann: isn't that the same, if we call it to read next file and modify file system meanwhile file list order gonna mix up?

nwf commented 6 years ago

@nedoskiv Given the need for segregated iteration and update to the FS, if the set of updates is small, it could be worth batching them up within the iterator and then executing them all at once after iteration is over. If a large number of updates need to be made, you'll likely have to restart iteration after each one, which is, unfortunately, going to take time proportional to the product of the number of files and the number of updates, but given that the ESPs have very little memory, this may be the best you can do.

HHHartmann commented 6 years ago

@TerryE I see, the code is somewhat difficult. I am not sure if I could manage to write an iterator myself in a decent amount of time. But as I understand most of the complexity also arises due to the array handling and not the iterator itself. I had only looked at PiL 7.1 which seemed fair enough to give it a try to me. But I might still give it a try if that would be the preferred way to handle it and there is any chance to get it integrated here at all.

TerryE commented 6 years ago

Gregor, it takes some time to get your head around the Lua array manipulation API, but ltablib.c itself provides lots of worked examples. In my view this really only starts to become an issue if you have over 64 files in SPIFFS. What % of our users would want to do this?

If I were developing this app for myself, then I wouldn't bother implementing this sort of patch; I would code around it in Lua. However I have occasionally used

for f,s in pairs(file.list()) do
  if s:match(pattern) then
    -- process matching file name
  end
end

And I can imagine the extension where we pass an optional match pattern into file.list() being used, and moreover the OP could easily use this to scale his app.

nedoskiv commented 6 years ago

@nwf are we talking about iterator writen in C or in Lua?

TerryE commented 6 years ago

are we talking about iterator writen in C or in Lua?

You've missed some of the points in the dialog. The state issue is nothing to do with the implementation language, but as it happen you would have to write this in C.

BTW, having just looked at your app, you really need to use LFS. This would make your like so much easier and the entire app would be in flash and all 44Kb heap available for data.

nwf commented 6 years ago

@nedoskiv The iterator I had in mind would have to be in C, provided by the SPIFFS/Lua glue.

nwf commented 6 years ago

@TerryE I looked into extending file_list with a pattern argument, but everything I can see in lstrlib.c for pattern matching is static. Changing that latter fact would cause us to diverge from Lua upstream, which doesn't seem terribly attractive. I suspect the simpler approach will be to write an iterator in app/modules/file.c and write a wrapper around it in Lua to do the pattern matching.

TerryE commented 6 years ago

@nwf Nathaniel, I am not sure what you mean by "diverging from Lua upstream", and why come to this conclusion. It is pretty simple (roughly 20-30 extra lines in file.c) to implement what I suggest. No changes to app/lua at all. But I want to stay focused on LFS at the moment, so I am trying to encourage someone else to do this.

You can't put a standard Lua iterator around the vfs_readdir() calls, because Peter's SPIFFS FAQ makes it very clear than any write backs to SPIFFS will break the assumptions that underpin the SPIFFS readdir coherency. If you offer a file iterator then it just gives App developers like @nedoskiv the rope to hang themselves.

nedoskiv commented 6 years ago

well after that discussion i made a conclusion, if we want somehow to have full file list for large number of files it must be implemented in SPIFFS itself (perhaps function that write all filenames in a file, that way they can be iterated using file.readline) . In other words my simple enhancement gonna work, if we do not modify FS meanwhile. That is not good solution for me, so I gonna find a way to make that file list function return a little bit less memory consuming and able to get at least 500 file names without rebooting the device. @TerryE LFS is good thing, but in the moment when I call file list device was almost nothing in memory so using it gonna give me 3-4k max , that not solve my problem.

nwf commented 6 years ago

@TerryE As far as I can tell, the only place that knows how to interpret Lua patterns is app/lua/lstrlib.c, and it does so entirely within static methods. If I were a better Lua FFI programmer I suppose I could look up string.find and arrange for it to be called by the VM from within file.c, but I'm not and so I, too, will leave this to someone else. :)

TerryE commented 6 years ago

it does so entirely within static methods

What is wrong with calling string.match() directly from file.c? So if

then luaL_getmetafield(L, -2, 'match') pushes the function string.match(), so we have

on the stack. We then need to dup top-3 and move it down 2, so that stack contains

and do a lua_call(L, 2, 1) to call it with one return, leaving the

on the stack. If the ToS is not nil then we've got a match. Either way pop the ToS to reset the stack for the next loop.

nedoskiv commented 6 years ago

OK it is time for me to seek little help, I tried to make return of file.list function to become just one string where each filename is writen in new line:

static int file_list( lua_State L ) { char temp[32]; unsigned st = luaL_optinteger( L, 1, 1 ); // start offset unsigned tf = luaL_optinteger( L, 2, 100000 ); // how much files to list tf=tf+st; vfs_dir dir;

if (dir = vfs_opendir("")) { lua_newtable( L ); struct vfs_stat stat; int i=1; int ii=0; while (vfs_readdir(dir, &stat) == VFS_RES_OK) { if (i<st) { i++; continue; } if (i>=tf) { break; } strcpy (temp,stat.name); strcat (temp,"\n"); lua_pushstring( L, temp ); i++; ii++; } vfs_closedir(dir); return ii; } return 0; }

It do not work as expected, If I request more than 40 files I see output like this:

3fff0d10 already freed 3fff11b8 already freed 3fff0cc8 already freed 3fff1f88 already freed

and device restart, but if I request 30 files, and everytime increase them in steps of 30, manage to get 400 files at one time. I took example from how wifi.ap.getip return values and made it same way for some reason it do not work.

=file.list(1,30) =file.list(1,60) =file.list(1,90)

that way it works, If I do directly:

=file.list(1,60)

it does not work.

TerryE commented 6 years ago

@nedoskiv, this list is for discussing and deciding how to change our standard firmware. I have explained multiple times that your approach isn't what we want and is going to be difficult if not impossible to get working. Please see our support policy for advice on where to get the help that you are seeking.

Given #2452, I think we should focus on this.

nedoskiv commented 6 years ago

I have solved my problem so far - make file list function return a smaller lua table each row contain 1kb list of files separated by \n, that way i got 550 files at 8KB memory cost. Since that is not good method for nodemcu team to implement, not gonna to bother them with my code.

TerryE commented 6 years ago

@nedoskiv, glad you solved your problem. I had intended this issue to cover the wider scope of what file.list() extension should we adopt, but since @nwf's #2452 now fixes this, we can leave this closed.