spectraphilic / wsn_server

Software and django applications for wsn and iot setup
GNU General Public License v3.0
1 stars 0 forks source link

Upload CR6 files #2

Closed jdavid closed 4 years ago

jdavid commented 5 years ago

This is a description of the observed behaviour.

CR6 files are uploaded every 30 minutes. Sometimes the uploaded file is truncated. The good news is that when a file is truncated, CR6 will upload again the same file. This is what I've observed, for instance:

wsn@latice-vm:/home/ftpuser/cr6/finseflux$ ls -l Biomet_2019-02-09_1*
-rw-rw-r-- 1 wsn     wsn       5460 Feb  9 11:30 Biomet_2019-02-09_10-05-00_3321.dat.xz
-rw-rw-r-- 1 wsn     wsn       7948 Feb  9 11:30 Biomet_2019-02-09_10-10-00_3321.dat.xz
-rw-rw-r-- 1 wsn     wsn      21004 Feb  9 12:00 Biomet_2019-02-09_10-15-00_3321.dat.xz
-rw-rw-r-- 1 wsn     wsn      23212 Feb  9 12:00 Biomet_2019-02-09_10-35-00_3322.dat.xz
-rw-rw-r-- 1 wsn     wsn      24796 Feb  9 12:30 Biomet_2019-02-09_11-05-00_3323.dat.xz
-rw-rw-r-- 1 wsn     wsn      24648 Feb  9 13:00 Biomet_2019-02-09_11-35-00_3324.dat.xz
[...]

Above, there have been 2 truncated uploads and then a complete upload. In the filename the last number identifies the uploaded file, we see here the number 3321 three times.

A truncated file may be in the following states:

  1. Empty. The importing script automatically and silently renames this file appending the .empty suffix
  2. Truncated in the header, before the content starts. For these files a receive an email, but I've to manually rename them (otherwise the import script will try to handle them again and again), I append the .truncated suffix.
  3. Truncated in the middle of a content row, this will produce a parsing error and I will get an email, but the good lines before will be imported, and the file compressed as normal at the end.
  4. Truncated exactly at the end of a content row. The import script will not distinguish this one from a complete upload.

For the last two, some good lines have been imported. When the file is uploaded again the script will handle the same lines again. But it detects they've already been imported, so they don't produce duplicates in the database, but I get an email.

jdavid commented 5 years ago

@norberp

jdavid commented 5 years ago

Approximately 1.5% file uploads are truncated.

jdavid commented 5 years ago

Analysing the ids of the files uploaded, there're no gaps. This shows that the system is (apparently) robust even when the network is cut for a long time, as it has happened in the past. Sometimes the counter goes back to zero and starts again, this may be when the CR6 reboots.

@norberp if you have observed gaps in the data available in the database it may be worth to give a closer look.

jdavid commented 5 years ago

TODO When a file is truncated in the header, before the content starts, automatically rename it to avoid the script to try again and again (and to flood me with emails).

jdavid commented 4 years ago

done