sparkfun / OpenLog

Open Source Hardware Datalogger
https://www.sparkfun.com/products/9530
Other
547 stars 215 forks source link

data recording suddently stops #132

Closed lpollini closed 11 years ago

lpollini commented 11 years ago

I have been using this really pretty piece of HW/SW for few days. Firmware version is 3.11 (so as recevived by sparkfun) I am recording binary data: a total of 86 bytes @ 100 Hz = 8600 kbytes / sec using serial at 115200. my config file is : 115200,26,10,0,0,0

I experience sudden interruptions of data recording and it seems that there is a correlation between the value of the third parameter: setting_max_escape_character and the total length of the log file : if it is set to 3, file are much shorter (about 10 seconds of data recording) while if it is set to 10 they can be long several minutes.

What I see in trems of LEDS is: blue led (same side as GND) always flashes green led (same side as RXI) flashes at high frequency and for very short time (I can barely see it light up) at a certain point,green led stops flashing (while the blue one still is) and no more data is recorded....

So I have read the issues section, found the interesting issue 105, so I gave a look at the code in: https://github.com/sparkfun/OpenLog/blob/master/OpenLog_v3/OpenLog_v3.ino

and found that, as far as I undertsand, there may be a bug that could induce reception of an escape sequence even when the escape characters are not one after the other:

look at the code:

//Start recording incoming characters  while(escape_chars_received < setting_max_escape_character) {
byte n = NewSerial.read(localBuffer, sizeof(localBuffer)); //Read characters from global buffer into the local buffer
if (n > 0) //If we have characters, check for escape characters
{
  if (localBuffer[0] == setting_escape_character)
   {
    escape_chars_received++;        //Scan the local buffer for esacape characters
    for(checkedSpot = 1 ; checkedSpot < n ; checkedSpot++) {
      if(localBuffer[checkedSpot] == setting_escape_character) {
        escape_chars_received++;            //If n is greater than 3 there's a chance here where we receive three esc chars
        // and then reset the variable: 26 26 26 A T + would not escape correctly
        if(escape_chars_received == setting_max_escape_character) break;
      }
      else
        escape_chars_received = 0;
     }
  }
  workingFile.write(localBuffer, n); //Record the buffer to the card

say now that NewSerial.read(...) always returns n=1 this should pose no problem in fact, BUT, if n=1 escape_chars_received is never reset because the code escape_chars_received = 0; is executed inside the FOR statement, thus only if n>1 while it should go to 0 as sson as a new non-escape char is received....

is this correct ? solutions may be multiple but before trying to modify the firmware (and re-flashing) since I have never done it before and this is the first time with ARM and arduino for me I just want to be sure.

 Lorenzo 
nseidle commented 11 years ago

Thanks for playing with OpenLog! There are some significant changes between v3.11 and v3.12/3.13. If possible, I recommend upgrading the firmware. Some of the append_file() function changed, but those changes are unrelated to the issue you bring up.

You make a very interesting argument. I believe the easiest way to test this is to send 2 escape chars, then something not, then another escape char. If it exits to command prompt, we've got a bug. I'm still trying to wrap my head around logic that I wrote months/years ago, but you may be correct. Please continue to look at it and let me know what you think. I'll try to set some time aside this afternoon to test it.

lpollini commented 11 years ago

I will probably try it tomorrow, I do not have a FTDI adapter for reprogramming right now (I was not planning to do it) and never developed anything with arduino I must see if I can successfully create some kind of harness to flash it using my usb/serial adapter

in addition, regarding my issue I do not believe that this "possible bug" is the cause of my problem because I do not see how it can reasonably happen to me since I send burst of 86 bytes to the serial port where the occurrence of a single escape character is really rare and in couples (two one after the other) even less

there's another thing that just happened: I left the logger on for three hours, it recorder the first 10 minutes (about 5 MB) of data, then stopped the fact is that on the SD I found 3 additional files (named with increasing number) each few hundreds of bytes but with no intelligible data inside: my 86 bytes packet starts with a header that was not present in these additional data

thanks, again, Lorenzo

On 1/21/2013 6:29 PM, Nathan Seidle wrote:

Thanks for playing with OpenLog! There are some significant changes between v3.11 and v3.12/3.13. If possible, I recommend upgrading the firmware. Some of the append_file() function changed, but those changes are unrelated to the issue you bring up.

You make a /very/ interesting argument. I believe the easiest way to test this is to send 2 escape chars, then something not, then another escape char. If it exits to command prompt, we've got a bug. I'm still trying to wrap my head around logic that I wrote months/years ago, but you may be correct. Please continue to look at it and let me know what you think. I'll try to set some time aside this afternoon to test it.

— Reply to this email directly or view it on GitHub https://github.com/sparkfun/OpenLog/issues/132#issuecomment-12508078.


 Lorenzo Pollini, Ph.D.
 Tenured Assistant Professor
 Dept. of Energy and Systems Engineering
 University of Pisa (Italy)
 Via Diotisalvi,2
 56126 Pisa, ITALY
 Tel. +39 050 2217363
 Fax +39 050 2217333
 e-mail: lpollini@dsea.unipi.it
 web: http://www.dsea.unipi.it/Members/polliniw

nseidle commented 11 years ago

If you're seeing multiple files created it sounds like the unit is resetting under normal use. I usually attribute this to the unit running out of RAM. This was a known issue in v3.11.

To reprogram the unit, any kind of serial connection and 5 wires should work (VCC, GND, TX, RX, Reset).

The counter will be reset when the flow of data stops for more than 500 msec (MAX_IDLE_TIME_MSEC). I don't think this is part of your issue but the code does have ways to reset the variable.

I'm wondering if it makes sense to add:

if (localBuffer[0] == setting_escape_character)
{
...
}
else
     escape_chars_received = 0;

That's a few extra cycles that will be consumed but it will insure that the escape char variable gets properly reset.

nseidle commented 11 years ago

I think I confirmed your bug. I created a test sketch (now in the repo) that throws random binary values at OpenLog.

At 9600bps and 3 escape characters, the unit would drop to command prompt. Assuming I had received 3 random escape chars in a row, I increased to 9 escape chars. Recording completed without dropping to CP.

Increasing the baud to 115200, with 9 escape chars, sending 200,000 random bytes, unit would drop improperly to command prompt. I repeated a few times to be sure it wasn't just 9 random escape chars showing up next to one another. This is highly unlikely. It must be a bug.

I added the line:

else
     escape_chars_received = 0;

Testing at 115200 bps, 9 esc chars, recording completed correctly showing 200,000 bytes in the log file. Tested multiple times. It dropped a few hundred bytes on one log, but that is a separate known record speed/card issue.

Testing at 115200 bps, lowering to 3 escape characters, recording completed correctly showing 200,000 bytes in the log file. I would believe this. It's unlikely (1 in 256^3 = 16,777,216) that three escape chars would be randomly generated.

So I believe this bug has been squashed. Please let me know how else I can help. I am happy to get you a FTDI and or updated OpenLog if you need it.

lpollini commented 11 years ago

Great, that's "THE POWER OF ELSE" :-)

I updated the firmware and did a test at 115200 bps with 3 escape characters and recorded for 3 hours without any stop.

In fact I noticed several packets lost: I have a time stamp in the packet (put by the sender that is an embedded real-time system) If I plot the difference between a time stamp value and the previous one, the graph should stay at 0.01 (the period at whivh the data is sent).

The graph instaed is not as it should be but shows several peaks (if you take the height of the peak and divide by 0.01 you have the number of lost packets). Note that a single missing byte becomes 2 lost packets: the current and the successsive since synch with the header is lost.

I attached 3 images showing these graph at various levels of zoom.

You will notice that packet loss happens in a very periodic and
deterministic pattern.

I will try with a faster SD card to see if the problem remains.

LP

timining1

timining2

timining3

lpollini commented 11 years ago

I think we really solved : I tested the same situation with a 4 GB class10 microSD and I lost few tens of packets over a period of 1 hour. At the same time I recorded the same stream in parallel on a PC with a usb serial adapter and a simple PC program reading from serial port and damping to file. At the end, the file on the PC contained less bytes than the SD .... and number of mangled packets was larger

At this point I believe my initial issue can be define solved.

I am happy to have contributed to highlight that bug.

Thanks Nathan, You did a good job.

Lorenzo

nseidle commented 11 years ago

Woohoo! Closing bugs. Thanks Lorenzo!