nmdp-bioinformatics / service-feature

Enumerated sequence feature service.
GNU Lesser General Public License v3.0
2 stars 10 forks source link

RACE CONDITION: accession needs to be unique within a gene, term, rank but its not #55

Closed mmaiers-nmdp closed 3 years ago

mmaiers-nmdp commented 3 years ago

http://feature.b12x.org/features/HLA-DPB1/intron/2

returns 4 sequences with accession number 3370. This can't happen.

The code here needs to be a transaction

mmaiers-nmdp commented 3 years ago

Solution:

  1. constrain the database table to be unique across locus, term, rank, accession
  2. this requires first re-assigning accession numbers that are currently duplicated
  3. then wrap the code that inserts to try again if the insert is rejected due to to this new constraint
pbashyal-nmdp commented 3 years ago
  1. Fixed with PR #56
    • The database create script service-jdbi/src/main/sql/mysql-create.sql is updated to create unique key when building the database.
      unique key `unique_feature` (`locus_id`,`term_id`,`rank`,`accession`),
  2. Update the database (see separate comment)
  3. Updated createFeature method in service-jdbi/src/main/java/org/nmdp/service/feature/service/jdbi/JdbiFeatureService.java class to try MAX_INSERT_FEATURE_TRIES (10) times to create a new feature in case of constraint violation with duplicate accession ids.
pbashyal-nmdp commented 3 years ago

Fixing the database:

Following procedures are needed to update the data in the database:

  1. Copy the feature table and create fresh accession ids within locus_id, term_id, rank group.

    create table feature_updated
    select feature_id, sequence_id, locus_id, term_id, rank,
       (IF(unique_group = @cur_group,
           @cur_row := @cur_row + 1,
           @cur_row := 1 AND @cur_group := unique_group)
           ) as accession
    from (select *, concat(locus_id, term_id, rank) as unique_group from feature) f,
     (select @cur_row := 0, @cur_group := '') as empty_row
    order by unique_group;
  2. Back up the original feature table just in case.

    create table feature_backup
    select * from feature;
  3. Drop the original feature table and rename updated feature table to the original feature table.

    drop table feature;
    rename table feature_updated to feature;
  4. Create constraint on the database to locus_id, term_id, rank, accession combination.

    alter table feature add unique key `unique_feature` (`locus_id`,`term_id`,`rank`,`accession`);
  5. Remove the backup if everything looks good.

    drop table feature_backup;