solgenomics / sgn

The code behind the Sol Genomics Network, Cassavabase and other Breedbase websites
https://solgenomics.net
MIT License
66 stars 35 forks source link

move dbpatches 00068-00071 to a datafixes script #761

Closed bellerbrock closed 7 years ago

bellerbrock commented 7 years ago

These patches have been written not to alter schemas or add cvterms, but to change data. After dicussion in the ticket meeting, we've come to the consensus that because they alter data, and only apply to specific databases, they should be implemented as scripts in the datafixes directory instead of as patches

These data fixes should also be separated into their basics parts, one script to add missing stock props and layout, and one to add missing project props.

Because the patches have been run on a few databases already, we'll also write a script to fix the patch history by removing the relevant rows from metadata tables like md_dbversion, and remove layouts added to cassavabase that don't need to be there.

nmenda commented 7 years ago

when adding missing field layouts to older trials (e.g. CIAT trials) make sure to add nd_experiment row to each trial linking it with its plots with type_id = the cvterm_id for 'field_layout'

so if you have for trial A 100 plots, there need to be 1 row in nd_experiment, one row in nd_experiment_project, and link that one nd_experiment_id with 100 rows in nd_experiment_stock , one row for each plot using the same nd_experiment_id.

TrialDesignStore should be used for adding this layout + the missing stockprops - replicate, block, plot number https://github.com/solgenomics/sgn/blob/master/lib/CXGN/Trial/TrialDesignStore.pm

nmenda commented 7 years ago

reload the 24 trials that were unlinked or partially unlinked from nd_experiment Trial IDs 99,114,515,956, 1028, 1205, 1303, 1294, 1405, 107, 1392, 962, 121,1337, 1235, 1206, 879, 1013, 1578, 2736, 1247, 2733, 1162

Alex is working on regenerating the data from the phenotyping file downloads from the database backup, and testing reloading on the tmp db on cassava-devel

Need to delete manually

  1. the plots of these trials = DONE
  2. the unlinked phenotypes = DONE
nmenda commented 7 years ago

Some of the trials have disease traits that cannot be mapped to the new disease variables, since we do not have the months after planting info in phenotypeprop (old IITA data)

aco46 commented 7 years ago

The script is been added to data_fixes dir in the phenome repo. A little modification was also done to the TrialDesignStore function not to return an error message when plot names already exist in the trial.

The commit can be found bb15ccbf3190219870221f17faa4cc613a28cc6e and 23b342bbfa19fe3eaddf103ab8f1e8517fcd8e6f

I'll create a pull request for the script to be reviewed by @nickmorales and @nmenda .

I forgot to mention that I've tested it on my vm for yambase, sweetpotatobase and cassavabase. While the script is been reviewed I'll test it on devel for cassava. There are ~1050 trials without design and layout, will probably take about a day to complete this process from what I observed running it on my vm. For yambase and sweetpotatobase, I'll test the script on the test databases.

aco46 commented 7 years ago

@nmenda : the files containing the trial names for yam, sweetpotato abd cassavabase have been added to Phenome repo via 7f877d7976e718e04d710e685109a63c436bc43a

aco46 commented 7 years ago

@nmenda : see link to the files; https://github.com/solgenomics/Phenome/commit/7f877d7976e718e04d710e685109a63c436bc43a

aco46 commented 7 years ago

The script that was written to fix trial design and layout for trials uploaded using loading script was moved to data_fixes folder in Phenome repo.