michalmonday / CSV-Parser-for-Arduino

It turns CSV string into an associative array (like dict in python)
MIT License
58 stars 12 forks source link

Parse one row at a time? #23

Closed maze61 closed 1 year ago

maze61 commented 1 year ago

Dear Mr. Borowski,

first of all, thank you so much for developing this useful and performant library!

It's Wednesday and not Monday, but can I ask you a question anyway?

I need to process a large CSV file (5000+ rows), so storing all rows in an array isn't feasible.

Is there any way to read and parse row by row (using an iterator)? Something like

while (cp.iterate()) { // there are rows left
  char **stringVal = (char **)cp[0];
  float *floatVal = (float *)cp[1];
  processValues(StringVal, floatVal); // process the values of this row outside of CSV_Parser
}

Thank you Marcus

michalmonday commented 1 year ago

Hello, thank you for the suggestion, I just added it in the 1.1.0 release, here is example showing how this could be done: https://github.com/michalmonday/CSV-Parser-for-Arduino/blob/master/examples/parsing_row_by_row/parsing_row_by_row.ino

I also wrote some notes about it here: https://github.com/michalmonday/CSV-Parser-for-Arduino/tree/master#parsing-row-at-a-time

I tested it a little for any obvious bugs but I didn't test it yet with large files. If the program fails in your case then please let me know.

maze61 commented 1 year ago

Hello, thank you so much for your fast reply, the implementation of my suggestion and also for the specific example!!!

It was pretty straight forward to implement the two required functions, in my case to read a CSV file from SD:

thisFileH = SD.open(FilePath, FILE_READ);
char feedRowParser() {
  return (thisFileH.read());
}
bool rowParserFinished() {
  return ((thisFileH.available()>0)?false:true);
}

A first test (10 rows) works perfectly! Testing with 5000 rows fails with Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled., but I have to investigate first by myself. Can do this only over the upcoming weekend (job is calling now). Will revert back to you then.

Thanks again, have a nice day and greetings from Vienna to Colchester, Marcus

maze61 commented 1 year ago

Hello Michal,

my quick & dirty test sketch was too dirty - mea culpa!

Parsing 5000 records performed flawless!

Some "benchmarks":

5000 records (char* | int32_t)
time to read: 1205 ms
time to read & parse: 2092 ms

5000 records (char* | float | float | float | float)
time to read: 4775 ms
time to read & parse: 7507 ms

So the parsing overhead is minimal, I expected much more.

Thank you for your efforts! Take care, Marcus

michalmonday commented 1 year ago

I just had a look and it appears that when string values are parsed (using "s" format specifier) row by row then memory is leaking, I will try to get it fixed.

michalmonday commented 1 year ago

It's now fixed in the 1.1.1 version.

The problem was that strdup is used for storing strings (which allocates memory), the destructor released the memory so it wasn't a problem before, but the parseRow() didn't do that, so it was just eating more and more memory. https://github.com/michalmonday/CSV-Parser-for-Arduino/commit/7569694988a85c7cdcf5abc17a12dcd80ca2184b

Thank you very much for letting know about it.

maze61 commented 1 year ago

Thank you so much Michal for your ultra-fast fix!!!

michalmonday commented 1 year ago

Btw I made some changes to improve the efficiency because string-based indexing was computationaly expensive, the parseRow in 1.2.0 version should be significantly faster I think (when integer based indexing is used). I modified the example code to use the more efficient way and added sd card based example using your code :)