tagbase / tagbase-server

tagbase-server is a data management web service for working with eTUFF and nc-eTAG files.
https://oiip.jpl.nasa.gov/doc/OIIP_Deliverable7.4_TagbasePostgreSQLeTUFF_UserGuide.pdf
Apache License 2.0
7 stars 2 forks source link

ISSUE-238 Scenarios for the acquisition of data files and file versioning #265

Closed lewismc closed 1 year ago

lewismc commented 1 year ago

WIP for https://github.com/tagbase/tagbase-server/issues/238 and https://github.com/tagbase/tagbase-server/issues/164 Work to be done

lewismc commented 1 year ago

More work to be done here @renato2099 but we are pretty close.

renato2099 commented 1 year ago

hey @lewismc , I pushed a WIP commit about moving from a trigger based data migration to a stored-procedure one, but before we go down that path I'd like us to do some additional validation and think through if we want this potentially large behavioural change at this point

renato2099 commented 1 year ago

hey @lewismc I run a couple of ingestions with this patch and it seems to be doing something 😅 I see data being ingesting but we should check if data migration is still working as expected before we proceed with this

lewismc commented 1 year ago

Hi @renato2099 I updated this patch and have tested it. It looks good. I will note the following things though

  1. upon ingestion of iccat_gbyp0008_ArgosTrans_eTUFF0.txt and successful migration, loads of data is left in proc_observations... this requires investigation. I don't think this is new behavior
  2. we need more unit tests 3. for some reason the result of a GET on /tags/{tag_id} now returns the same metadata for each submission rather than different metadata. I checked and confirmed that the correct metadata is populated into the database so this is definitely a bug in tags_controller.py
  3. we need to augment the stored procedure to accommodate the following scenario

User initially ingests a tag submission representing a reference track. User then ingests a different file which is for the same tag and dataset but the new reference track. We need to make sure that the original submission and the metadata is no longer the reference track. This is a kinda tricky as essentially the sha256 needs to change as well.

  1. finally, if I submit a .zip containing the three files iccat_gbyp0008_ArgosTrans_eTUFF0.txt, iccat_gbyp0008_ArgosTrans_eTUFF1.txt and iccat_gbyp0008_ArgosTrans_eTUFF2.txt the latter all three files may end up being assigned a different dataset_id depending on whether the transaction has completed yet and a entry has been written to dataset table before the next cursor attempt to read from that table. This has an impact on ingestion as it really depends on an initial entry being present before another file associated with the same dataset is ingested. I can demo this to you quite easily.

That being said, none of these really block us releasing 0.13.0. Let's discuss this weekend.

lewismc commented 1 year ago

@renato2099 I updated this PR to fix # 3 above. This was an old bug which we hadn't caught before. It is now fixed.