icipe-official / vectoratlas-software-code

MIT License
1 stars 4 forks source link

Handle Data Upload #487

Open gituman opened 2 months ago

gituman commented 2 months ago

Overview

  1. The data page should be adjusted and a new dropdown added where a user will be prompted to upload a csv/xlsx which will then be sent to the blob storage.
  2. Fields to be specified during upload include: Preferred email address for communication, checkbox if DOI is to be generated or not, capture DOI given if any, and a short description of the dataset.
  3. Once data is uploaded to blob storage, an email is sent to the uploader and the reviewer.
  4. There has to be an interface where the uploader can track the progress of his/her uploaded data.

Request table --- to log user's data upload requests and follow-up status

API endpoints

stevenyaga commented 2 months ago

Data upload workflow

1. Upload

2. Data alignment/approval

3. Data correction

4. Communication

Implications:

stevenyaga commented 2 months ago

During our weekly stand-up on 2024-09-20, the following was agreed:

  1. An uploader logs into the frontend and navigates to upload dataset page
  2. The uploader will attach a dataset and provide a short description of the data
  3. The system will notify all reviewers that a new dataset has been uploaded
  4. A reviewer will download such uploaded dataset from the system and consequently self-assign to review the dataset
  5. If the reviewer identifies issues with the uploaded dataset, they will write an email to the uploader asking the uploader to make corrections to the dataset
  6. The uploader will then re-upload a new dataset with the respective corrections and the reviewer will pick up this dataset from the backend for continuation of the review
  7. After the reviewer has completed the review process and judged the dataset to be valid, the reviewer will notify his manager to approve the dataset via an email notification automatically

Notes:

stevenyaga commented 2 months ago

Implementation

Four tables (models) will be required to support data upload and approval workflow

1. Uploaded Dataset model

Model to hold uploaded files including other meta data. Uploaded CSV files will be stored on disk (blob storage). Table fields are:

  1. last_upload_date helps to record date when an upload was done.
  2. last_status_update_date field records when the status of the uploaded dataset was lastly modified.
  3. title - Title of the dataset. This will be used when minting DOI
  4. description - Brief description of the dataset that may be of interest to a reviewer
  5. uploaded_file_name. Name of the file that has been uploaded. We will use this name to retrieve the file from disk
  6. converted_file_name. Name of the file that has been converted into VA template. We will use this name to retrieve the file from disk during ingestion stage
  7. provided_doi. DOI provided at time of upload if it exists
  8. status. Status of the uploaded dataset. Possible status are Pending, Approved, Under Review and Rejected
2. Uploaded Dataset Log model

Model to hold different activities that can be performed against an uploaded dataset e.g upload, re-upload, rejection, email communication, approval or rejection. Table fields are:

  1. action_type. Type of action that was performed. Possible values are Upload, Download, Communication, Approve, Reject
  2. action_date. Date when action occurred
  3. action_details. Details of the occurring action
  4. dataset. Uploaded dataset against which we are keeping a log
  5. action_taker. Who performed the action?
3. DOI Source model

Model to store data/information that may result into minting/storing of a DOI. The fields are:

  1. source_type. Where did the DOI request originate from. Possible values are Download and Upload
  2. download_meta_data. Metadata of dataset that was downloaded for which we intend to mint a DOI
  3. approval_status. Approval status of the DOI request
  4. title. Title of the DOI source. This will be used at the point of generation of actual DOI
  5. author_name. Name of the author/originator/requester
  6. author_email. Email of the author/originator/requester. This is mandatory as we will use this email to communicate with the author as the DOI is going through the approval process
  7. uploaded_dataset. Uploaded Dataset foreign key. Applicable where source_type is Upload
  8. approved_dataset. Approved Dataset foreign key. Applicable where source_type is Upload
4. Communication Log model

Model to hold communication against an entity including an uploaded dataset. This model will give us a generic means of recording all form of communication in the system. Model fields are:

  1. communication_date. Date of communication
  2. channel_type. Channel of communication e.g Email
  3. recipients. Recipients of the communication
  4. message_type. Type or subject of message being communicated
  5. message. Message to be communicated
  6. sent_status. Sent status of the message. Possible values are Pending, Sent, Failed
  7. sent_date. Date the message was sent
  8. reference_entity_type. Type of entity that triggered this communication
  9. reference_entity_name. Name or Id of the entity that triggered this communication
  10. error_type. Type of error that occurred during sending of the message
  11. error_description. Details of error that occurred during sending of the message
  12. arguments. Arguments or extra data passed during sending of the message
stevenyaga commented 1 month ago