Document raw data format

bartaelterman commented 9 years ago

In order to be able to create a script that will consolidate the raw data files, I need some documentation describing the columns in the raw data files. I found an example file containing 3 columns: transmitter, receiver, date_time.

@LifeWatchINBO/fish-tracking can someone what columns I can expect in the raw data files?

PieterjanVerhelst commented 9 years ago

I noticed a problem. When offloading data, VUE generates VRL files and CSV files. However, the format of the CSV can be different depending on the settings. As such, CSV files from the INBO field laptop have different headings compared to those obtained by the field laptop of VLIZ... For example, one contains the disjunct code_space and ID, while the other merges them in 1 column (Transmitter). I could send you two files as an example. Is this a big issue? Otherways we could still opting for the VRL export to CSV (extra VUE step)? In addition, at the moment there are no T-and/or P-sensors in the tags. However, in the future this will be the case, so additional information (i.e. extra columns) will be in the CSV files. I assume this is not a problem, but just to let you in case you need to take this into account.

peterdesmet commented 9 years ago

@PieterjanVerhelst, can you send a CSV and VRL export from a VLIZ and INBO receiver to Bart (bart.aelterman@inbo.be)? Will add it to this private repository as reference. I fear that the VRL export - although much more stable - cannot be used, as the data is encoded.

PieterjanVerhelst commented 9 years ago

I send him the files.

bartaelterman commented 9 years ago

The two files are indeed considerably different. Even the labels used in both files differ (e.g. "Date/Time" vs "Date and Time (UTC)"). There is no other way then to write two separate parsers. The script will estimate the file format (based on the column headers) and then choose the right parser. It's a bit of extra work, but doable.

The other thing you mention is another problem: additional file types to be expected in the future. The idea of the script was to concatenate all the files in the folder. And since the old files will still be there, that means the script will need to be able to distinguish probably 4 file types (INBO, VLIZ, INBO-new, VLIZ-new). Still doable, but depending on how many other file types we can expect later on, this solution might not hold very long.

As for now, there no reason to immediately change strategy. We need to be able to read 2 file formats, so let's do that. We'll just need to bare in mind that additional file types will cause some additional work on the script and we'll see for how long that remains feasible.

bartaelterman commented 9 years ago

I documented the two input formats based on the examples Pieter gave. You can find them here.

Can you review that?

PieterjanVerhelst commented 9 years ago

Looks good. As sensors are already taken into account (still blank), I don't think this will cause any problems. Indeed, at the moment only these two formats are used, so just apply the script to them. Just to know how the script works: in case for the VLIZ format, the script will select this format on basis of an 'IF Transmitter AND Sensor Value is TRUE, THEN VLIZ-format' procedure?

bartaelterman commented 9 years ago

Currently, it checks the header and if the first field is Date and Time (UTC) it's VLIZ format. If it is Date/Time it's INBO format. But there are other ways and if something else would be more robust, let me know.

The Sensor value is empty in the example VLIZ file I have.

PieterjanVerhelst commented 9 years ago

Sensor values are empty as the tags currently used contain no sensors. In the future, tags with sensors will be applied.

inbo / fish-tracking

Document raw data format #33