tagbase / tagbase-server

tagbase-server is a data management web service for working with eTUFF and nc-eTAG files.
https://oiip.jpl.nasa.gov/doc/OIIP_Deliverable7.4_TagbasePostgreSQLeTUFF_UserGuide.pdf
Apache License 2.0
7 stars 2 forks source link

Concurrent file uploads result in incorrect dataset_id reservations #276

Closed renato2099 closed 10 months ago

renato2099 commented 1 year ago

Ingesting multiple files concurrently (e.g., coming from zip files) can easily lead to submission's not being associated with the same dataset's. This can happen if no transaction has reserved a dataset_id yet therefore a file from the same dataset reserves a new dataset_id at the same time.

We can remedy this in the following ways...

1) Temporary workaround: disable parallel uploads from:

result = parmap.map_async(
        process_etuff_file,
        etuff_files,
        version=version,
        notes=notes,
        pm_parallel=False,         <<<<<<<<<< N.B. Sequential ingestion
        pm_processes=cpu_count(),
    )

2) Split metadata ingestion from content ingestion, that would reduce the possible window of concurrent transactions. Additional brainstorming and ingestion logic is required in order to perform multi-file paralle ingestion.

This issue came to light when we began to accomodate multi-track submissions in parallel.

lewismc commented 10 months ago

Addressed in #286