Open StevenCannon-USDA opened 1 month ago
@StevenCannon-USDA can you clarify, it looks like we already have hap1 represented in the main datastore from quite some time ago (as well as downstream apps). AFAICT hap1 genome sequences and the corresponding annotations have not changed (haven't looked at this exhaustively though). Can you confirm that hap1 needs no updating and this issue is just about pulling in hap2?
@adf-ncgr You are right. I prepared hap1 in April 2023. Comparing those annotation files to the new C. canadensis hap1 files, they appear to be the same -- at least at the level of word counts etc.
I made a more careful record of notes this time around, and of course made the hap2 files (which I hadn't done in 2023). But apart from that, I guess we can leave the hap1 files as-is (though I'll probably check the old and new READMEs and may update the hap1 file).
sounds good- my main goal in asking was to avoid having to redo any of the AHRD+downstream on hap1 if possible so I think we are on the same page...
Main steps for adding new genome and annotation collections
Genus/species/collection names:
Haplotype 1:
Haplotype 2:
Cercis/canadensis/genomes/ISC453364.gnm3_hap2.2S7P
Cercis/canadensis/annotations/ISC453364.gnm3_hap2.ann1.G88H
[X] Add collection(s) to the Data Store, including commits to datastore-metadata
[X] Validate the README(s)
[X] Update about_this_collection.yml
[x] Calculate AHRD functional annotations
[x] Calculate gene family assignments (.gfa)
[ ] Add to pan-gene set
[x] Load relevant mine
[ ] Add BLAST targets
[x] Incorporate into GCV
[ ] Update the jekyll collections listing
[x] Update browser configs
[x] run BUSCO
[x] Update DSCensor
[ ] Add LINKOUTS to datastore, refresh linkout service