pantherdb / fullgo_paint_update

Update of Panther and PAINT DBs with monthly GO release data
0 stars 0 forks source link

Recording changes to paint annotations in comments #28

Open dustine32 opened 5 years ago

dustine32 commented 5 years ago

Pasting from Anushya's 2019-04-10 email:

Dustin and I talked about the format of the revision history section of the comments on Tuesday. It is stored in the ‘remark’ column of the ‘comment’ table. Although the database supports multiple records for a given family id, Dustin will update the data to ensure no more than 1 comment record is associated with a family id. It will be formatted as follows:

  1. There can be up to 4 sections in the following order: a. # molecular_function b. # cellular_component c. # biological_process d. # Notes
  2. If the data does not have any of these sections, the PAINT code will prepend ‘# Notes’ to any existing curator notes.
  3. For each GO aspect in section 1, the data will be ordered in date descending order i.e. later dates precede earlier ones. The columns will be as tab-delimited and formatted as follows: a. Date – formatted as yyyymmdd b. UserName – User name of user or entity updating the family c. Operation – Either Save or Obsolete d. Public id – Node persistent id e. Term and qualifier(s)– Term accession followed by hyphen ‘-‘ followed by qualifiers delimited by commas and enclosed by round brackets
  4. Notes is free-form text that can be modified by the curator.

The PAINT client tool will be updated to only permit the curator to update the curator notes section. The tool will be modified to allow the curator to view both the revision history and curator notes. The PAINT server code will be modified to prepend any changes to the PAINT annotation section as given above.

The update pipeline can be changed to record its operations (e.g. obsolete, un-obsolete) in this comment record. Will likely take advantage of refactoring/replacing the paint_annotation, paint_evidence update SQL queries with a more programmatic approach (looping through queried data and submitting updates through DB-connected Django).

pgaudet commented 5 years ago

Sounds good ! I suppose changes on different dates will be captured chronologically ?

mugitty commented 5 years ago

Yes. The date will be in the first column and sorted in descending order i.e. later dates will precede earlier ones.

pgaudet commented 5 years ago

Excellent!

mugitty commented 5 years ago

The evidence codes should also be stored with the comment section. It should be formatted as follows:

There can be up to 4 sections in the following order:
a. # molecular_function
b. # cellular_component
c. # biological_process
d. # Notes
If the data does not have any of these sections, the PAINT code will prepend ‘# Notes’ to any existing curator notes.
For each GO aspect in section 1, the data will be ordered in date descending order i.e. later dates precede earlier ones. The columns will be as tab-delimited and formatted as follows:
a. Date – formatted as yyyymmdd
b. UserName – User name of user or entity updating the family
c. Operation – Either Save or Obsolete
d. Evidence code - if there are multiple codes, delimit with commas  
e. Public id – Node persistent id
f. Term and qualifier(s)– Term accession followed by hyphen ‘-‘ followed by qualifiers delimited by commas and enclosed by round brackets
Notes is free-form text that can be modified by the curator.
mugitty commented 5 years ago

The initial version of this update is available on PAINT. Similar change is required for monthly database update.