statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Standardize analysis names #345

Closed mestato closed 6 years ago

mestato commented 6 years ago

We need a standard, consistent pattern for all analysis names.

Can we add these patterns to loader pages? Where else can we put it so it stays consistent?

The gene expression analyses need to be updated to not only fit the above pattern but to be accurate (many say ozone when ozone actually many abiotic stresses are present).

Black Walnut Ozone Treatments -> "Juglans nigra Gene Expression - Abiotic Stress" Northern Red Oak Ozone Treatments -> "Quercus rubra Gene Expression - Abiotic Stress" Green Ash Ozone Treatments -> "Fraxinus pennsylvanica Gene Expression - Abiotic Stress" American Sweetgum Ozone Treatments -> "Liquidambar styraciflua Gene Expression - Abiotic Stress"
Sugar Maple Ozone Treatments -> "Acer saccharum Gene Expression - Abiotic Stress" Honey Locust Ozone Treatments -> "Gleditsia triacanthos Gene Expression - Ozone Stress" Tulip Poplar Ozone Treatments -> "Liriodendron tulipifera Gene Expression - Ozone Stress" Blackgum Ozone Treatments -> "Nyssa sylvatica Gene Expression - Ozone Stress"

bradfordcondon commented 6 years ago

Can we add these patterns to loader pages? Where else can we put it so it stays consistent?

We can add them to the "name" field by editing the field settings on the bundle. The pattern we want should definitely go there, and i think putting it there will be sufficient.

bradfordcondon commented 6 years ago

367 and views for other analysis types make it very easy (and glaring) to see the consistency of naming.

bradfordcondon commented 6 years ago
bradfordcondon commented 6 years ago

As we can see previous genome assembly names are all over the map.

we'l change them all to follow waht the instructions will say:

Names should follow the form

[Full latin name ] - Reference Genome [additional version info]
For example:

Fraxinus pennsylvanica - Reference Genome
screen shot 2018-09-04 at 2 53 25 pm
bradfordcondon commented 6 years ago

Here are the updated instructions:

screen shot 2018-09-04 at 2 59 55 pm
bradfordcondon commented 6 years ago

Here's how this looks now.

I'm going to double check on our meeting tomorrow this is what we want before i edit all analyses

screen shot 2018-09-04 at 3 30 19 pm
bradfordcondon commented 6 years ago

approved. I can go ahead.

bradfordcondon commented 6 years ago

Draft reference genomes for six Juglans species and Pterocarya stenoptera

I'm leaving this name as is for now and say we just expect to have some oddball names for analyses that arent for a single organism.

The alternative would be to list all 7 species.

bradfordcondon commented 6 years ago

almost done. Edge cases: TulipPoplar_v1 BLAST to Populus peptides note its not against trembl or swissprot.... is there any stuff loaded into chado from this?

Chinese Chestnut QTL and regular have 2x, blastp and blastx. for example: https://www.hardwoodgenomics.org/BLAST-annotation/1962957

bradfordcondon commented 6 years ago
select * from chado.analysis where name='TulipPoplar_v1 BLAST to Populus peptides';
analysis_id name    description program programversion  algorithm   sourcename  sourceversion   sourceuri   timeexecuted
13  TulipPoplar_v1 BLAST to Populus peptides        blastx  2.2.19  blast   TulipPoplar_v1_454Isotigs-vs-Pt156pep.blastx10.xml          2011-11-03 00:00:00
(1 row)
hardwoods_06112018=> select * from chado.analysisfeature where analysis_id = 13;
analysisfeature_id  feature_id  analysis_id rawscore    normscore   significance    identity
(0 rows)

so i think the tulip poplar BLAST ones can be safely deleted since they aren't linked to anything. Chinese Chestnut QTL I'm just going leave as is because that organism is a rule-breaker.