Open StevenCannon-USDA opened 10 months ago
@StevenCannon-USDA this is probably just a little glitch in processing, but FYI the file glyma.Wm82_NJAU.gnm1.ann1.KM71.protein_primary.faa.gz appears not to actually be compressed despite the suffix. Looks like samtools faidx in this case will just treat it as regular fasta and produce a fai but not a gzi file. I'll fix it, just wanted to let you know in case there's some aspect of processing that needs to be revisited (doesn't appear to have happened in the other annotation sets or with any of the other fasta in this one, though, so probably just a quirk of fate).
@adf-ncgr Thanks; will fix this upstream.
Steven: I am not sure what you want me to review. Looking at the check boxes I have the following questions
@maxglycine - Not looking for a review actually. Just wanted you to be aware that this collection was underway. This is the generic template for getting genome+accession collections loaded: to the Data Store, the Mines, to GCV, to SequenceServer, etc. But your questions about protocols are valid. A work in progress. The protocols are being collected here: https://github.com/legumeinfo/datastore-specifications/tree/main/PROTOCOLS
@StevenCannon-USDA the AHRD/BUSCO/GFA files are now in the annex folders. Shall we move these into v2 and proceed with downstream steps?
Shall we move these into v2 and proceed with downstream steps?
Thank you - and yes please!
Main steps for adding new genome and annotation collections
Genus/species/collection names:
Glycine/max/genomes/Wm82_NJAU.gnm1.N4GV
Glycine/max/annotations/Wm82_NJAU.gnm1.ann1.KM71
[X] Add collection(s) to the Data Store (annex)
[X] Validate the README(s)
[x] Update about_this_collection.yml
[x] Calculate AHRD functional annotations
[x] Calculate gene family assignments (.gfa)
[ ] Add to pan-gene set
[ ] Load relevant mine
[ ] Add BLAST targets
[x] Incorporate into GCV
[ ] Update the jekyll collections listing
[ ] Update browser configs
[x] run BUSCO