pepkit / pepdbagent

Database for storing sample metadata
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Order of processing of PEP #10

Open nsheff opened 1 year ago

nsheff commented 1 year ago

When a PEP is put in the database, when is it processed? I can see two possibilities:

  1. A PEP is loaded, processed, and the processed PEP is put into the database.
  2. A PEP is loaded but not processed, the unprocessed PEP is put into the database.

The advantage of 1 is that the processing only happens once, so it reduces compute time. With 2, you'd have to reprocess it every time the PEP is requested.

However, if we want to allow the user to tweak the rendering of the PEP, for example, by changing env vars (https://github.com/pepkit/pephub/issues/3), this will only be possible with option 2.

Hybrid?

Is it possible to split the idea of "processing" into two stages: 1. the sample_modifiers and project_modifiers are done before entry into the database, and then 2. path expansion is done on-the-fly ?

khoroshevskyi commented 1 year ago

At this moment, pep_db is storing all variables from peppy object. Every modification that had been done in peppy before loading to db will be saved to db. We can always reupload every project to db if it's necessury. If I am not mistaken, pepagend and pep_db are working as your first scenario. Correct me please if I am wrong.

khoroshevskyi commented 1 year ago

This is outdated issue. We found solution for it. pepdbagent receive peppy. Project object and retrieves unprocessed PEP files that are stored in Python data objects. The dictionary of unprocessed PEP is later added to the database.