tagbase / tagbase-server

tagbase-server is a data management web service for working with eTUFF and nc-eTAG files.
https://oiip.jpl.nasa.gov/doc/OIIP_Deliverable7.4_TagbasePostgreSQLeTUFF_UserGuide.pdf
Apache License 2.0
7 stars 2 forks source link

Scenarios for the acquisition of data files and file versioning #238

Closed tagtuna closed 1 year ago

tagtuna commented 1 year ago

Use case 1: Tag returned logged data by orbiting satellites File: iccat_gbyp0008_ArgosTrans_eTUFF0.txt Date: April 4 instrument_name = "iccat_gbyp0008" What happened: Received satellite messages were decoded by Wildlife Computers software (via an online backend platform, with a particular firmware version). A geolocation algorithm was run and a track was generated. This track was deemed as the best possible at this time (reference track). A set of output .csv files were then downloaded and converted into an eTUFF file by client.

Use case 2: Client ran the geolocation algorithm to generate two additional tracks Files: iccat_gbyp0008_ArgosTrans_eTUFF1.txt & iccat_gbyp0008_ArgosTrans_eTUFF2.txt Date: April 17 instrument_name = "iccat_gbyp0008" What happened: Client re-did the geolocation processing and generated two new track solutions using different speed filters (captured by the metadata attribute, geolocation_parameters). Separate eTUFFs were generated with the track data only. Client did not believe either of the solutions was better than the original track; therefore client just want to append these for future use/ further evaluation. By the same token, the eTUFFs did not include the original logged water column data because client thought it is a waste of space to repeat data that is already submitted.

Use case 3: Hardware was physically recovered and the client was able to download the complete archive via an USB cable No file example Date: June 30 instrument_name = "iccat_gbyp0008" What happened: The downloaded data represent the complete records. Data available from Use Case 1 is a subset of this archive. A new eTUFF (much bigger file size) was generated. Client believes this “version” provides the best representation of the logged data, and finds limited value in retaining earlier versions. Tracks were re-run but the solutions were not that different from the previous ones, therefore no changes were required there. eTUFF_examples.zip

Things to consider

  1. All the above use cases provide eTUFF files containing data (and additional data) for the same instrument in the same tag deployment on the same animal. Therefore instrument_name this piece of metadata remains the same throughout. However, my initial thought is for our internal/database tag_id is to keep tag_id the same for use case no. 1 & 2, but different for case no. 3. This would allow us to distinguish the satellite transmitted dataset vs. physically downloaded dataset (coz' different tag_id) but as they share the same instrument_name, we can use that to lookup all the events that have happened as illustrated by the use cases.
  2. I emphasize on an instrument+deployment combo because the same hardware could be reused or redeployed on another animal. That means we can't rely on a combination of serial_number and ptt or even platform (mentioned in issue #164). That's also why we have asked the client to make sure instrument_name is unique.
  3. submission_id will be most useful for keeping tabs on events happened. For use case 1 & 2, submission_id + tag_id should yield 3 different combinations because they are ingested from 3 separate files. This allows us to work out which is the original logger data (use case 1, combo 1), reference track (use case 1, combo 1), alternative track solution one (use case 2, combo 2) and alternative track solution two (use case 2, combo 3).
  4. For multiple-track implementation, my first assessment is that, all the current ingest and migration steps should pretty much be the same. Data table data_position will house the reference tracks and alternative solutions, distinguishable by the different submission_id and tag_id combos.
  5. The only remaining step is to update metadata_position table if any new track ingested should be flagged as a "reference track" by checking the metadata attribute referencetrack_included in the eTUFF.
lewismc commented 1 year ago

@tagtuna I downloaded these and will try to ingest tonight.

lewismc commented 1 year ago

@tagtuna can you please summarize what the overall desired outcome is of the above? Thanks

tagtuna commented 1 year ago

@lewismc Thanks for the query - I have added more on the issue. Hopefully this helps.

lewismc commented 1 year ago

@tagtuna the attached .zip file is somewhat troublesome. If I unzip it I get the following

unzip eTUFF_examples.zip
...
Archive:  eTUFF_examples.zip
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF1.txt
   creating: __MACOSX/
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF1.txt
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF0.txt
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF0.txt
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF2.txt
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF2.txt

As you can see, it appears that there are nested directories in the .zip archive. This is not a major problem as we can simply avoid any directories and only process files in the root directory, however please confirm what the behavior should be. Thanks

tagtuna commented 1 year ago

@lewismc Sorry - that zip wasn't what I intended to make. I guess it must be some weird way I get the files zipped up!

lewismc commented 1 year ago

OK thanks for confirming. If we ned to augment this aspect of the ingestion logic in the future at least we can come back to this thread. Thanks

lewismc commented 1 year ago

@tagtuna what is a combo? You refer to various combo's and I don't see the details here. Thanks for explaining.

tagtuna commented 1 year ago

Lewis, by combo I meant: submission_id plus tag_id combination

On Thu, Apr 20, 2023, 13:28 Lewis John McGibbney @.***> wrote:

@tagtuna https://github.com/tagtuna what is a combo? You refer to various combo's and I don't see the details here. Thanks for explaining.

— Reply to this email directly, view it on GitHub https://github.com/tagbase/tagbase-server/issues/238#issuecomment-1515735529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC272JTWJD3KW4BPR7MUIPLXCDCPTANCNFSM6AAAAAAXAWMNYE . You are receiving this because you were mentioned.Message ID: @.***>