m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

Fix target upload (epic 3) #649

Open phraenquex opened 3 years ago

phraenquex commented 3 years ago

Ticket for frontend work: #540

This ticket currently covers mainly backend work (@duncanpeacock ), including: Things to fix

duncanpeacock commented 2 years ago

Tyler and I had a meeting on Thursday on this. Attached is an analysis with a solution outline and some estimates: https://docs.google.com/document/d/1T5RV4TzzwShdR5wNMXe6Nx2gdS-HaVLE4EXxCufnKzw/edit?usp=sharing

phraenquex commented 2 years ago

For "deleting" structures, 3 main actions:

duncanpeacock commented 2 years ago

The solution design document has been updated with the delete processing: https://docs.google.com/document/d/1T5RV4TzzwShdR5wNMXe6Nx2gdS-HaVLE4EXxCufnKzw/edit?usp=sharing

phraenquex commented 2 years ago

@duncanpeacock - if you haven't yet, also spec the mechanism for communicating errors back to the uploader.

A specific error: dataset ID not unique. (That's the "X0001" or "P0001" number.)

duncanpeacock commented 2 years ago

From #673

Include Crystallographic files

Currently the upload process only stores files from the aligned directory in the database.. The download process as designed currently only picks files from these fields - following the design decision to keep the process as simple as possible.

This will have to be modified to properly store the files from the crystallographic folder in the database as part of the target upload process. Then this could be access as part of the download in a similar way to the current aligned files. Unlike the aligned folder, we want to make this flexible so that new file types can be uploaded in the target loader without code changes.

Crystallographic mapping:

Aligned Crystallographic Mpro-x0072_0A - Mpro-x0072 Mpro-x0072_1A - Mpro-x0072 Mpro-x0104_0A - Mpro-x0104

So the files in the Crystallographic folder can be accessed using the base Crystal name (stem of the protein code without the _0A, _1A etc)

We would add a new Crystallographic table with an array of links to an associated files table that contains a mapping to indicate the file template to identify the file in the crystallographic folder.

Name of Crystal, Many2Many (id, Target, File (FileField), FileTypeMapping)

File-FileTypeMapping Mpro-x0072, Mpro-x0072.pdb, PDB

When the target loader runs, it will load the Crystallographic and files tables. If a protein code is marked as changed in the metadata then both the associated aligned AND crystallographic files are updated

In the download structures window, when the crystallographic structures are selected, the API will extract the crystal name from the protein codes provided and supply the desired files from the database - adding them to the crystallographic folder in a sub-directory labelled with the source protein (e.g. Mpro-x0072).

The new fields will be:

  1. PDB - format: {source protein}.pdb
  2. MTZ - format: {source protein}.TBC
  3. Event MTZ - format: {source protein}.TBC
  4. Raw ccp4 map files; format: {source protein}.TBC

And one other point:

If so I can add this to the design document.

phraenquex commented 1 year ago

This epic should include versioning, and that's part of the schema update/redesign.

phraenquex commented 1 year ago

For versioning - brainstorm by @phraenquex, @tdudgeon, Daren

tdudgeon commented 1 year ago

Initial high level spec for the new loaders: https://docs.google.com/document/d/1osK1mbaO5TrNRY8-0P5piiYodaEA_z_sgU_7SmjfzHA/edit#

phraenquex commented 1 year ago

Work remaining for epic: 1. Complete XChemAlign #999 (May 20?)

  1. Complete database schema (Django) #1008 (Jun 2nd)
  2. Rewrite (implement new) target loader #1055
  3. Finalise B/E higher-level APIs #1056 5. Rewrite F/E #540 (???)
  4. Testing
  5. Download (scope out)

Things @tdudgeon has worried about:

  1. Historical data - parallel behaviour? update existing? (100+ projects)
  2. Non-Diamond data - desnsity for PDB files - download maps from PDB (envisioned XCAlign v2)
  3. Curating biomol assemblies etc. - generate easy-to-load-and-view scenes (pymol, coot, etc.)
  4. ~Visits handled differently in XCA & Fragalysis~ Not an issue - uploader decides on ONE visit at upload time.
  5. CIF to MOL etc. Which tool? Discuss with Conor, can ask CCP4bb as well
phraenquex commented 1 year ago

Crystallographic files

Place-holders already in database, should hold and serve the files, but needn't digest it. Files should be in media dir, not in database (that would be a longterm risk)

Historic data

Two options:

  1. Keep separate Fragalysis instance for historic data - redirect from landing page
  2. Fix data-finding heuristics so they gracefully report to backend (and thus frontend via API). @alanbchristie to assess.

How to handle re-alignments of existing data

Parser of Align Output should assess whether new and meaningfully different from the old alignments. If not, toss the new one. It's Tim's side of the code that must do this.

Old-style data along-side new-style data

We won't try and have them co-exist; we'll have to re-upload existing data.
We'll need to think of a mechanism to transfer tags so they stay attached to the same compounds.

non-XChem PDB files

Do align them onto each site, whether or not they have ligands bound.

Where do soaked compounds come from

@tdudgeon and Daren to settle on the convention.
Might be available in SoakDB already - @phraenquex had previous discussed with Daren adding an extra column

Are uploads recorded as events

No - FE/BE API will use upload datestamps to allow FE to present it properly.