pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

normalize evidence codes #620

Closed ValWood closed 6 years ago

ValWood commented 7 years ago

Merge case differences first

 1 | Inferred from Key Residues
 1 | Inferred from Genomic Context
 2 | sodium dodecyl sulfate polyacrylamide gel electrophoresis evidence
 4 | inferred from Reviewed Computational Analysis
 6 | immunofluorescence evidence
 9 | Other direct assay
 9 | Epitope-tagged protein immunolocalization experiment data
 9 | Immunolocalization experiment data
 9 | Particle size and count assay
10 | fusion protein localization evidence
11 | transcript expression level evidence
14 | in situ hybridization assay evidence
22 | Sodium dodecyl sulfate polyacrylamide gel electrophoresis
25 | RNA protection assay evidence
41 | Chromatography evidence
51 | reverse transcription polymerase chain reaction transcription evidence
51 | Inferred from Expression Pattern
75 | Plasmid maintenance assay evidence

103 | Electrophoretic mobility shift assay data 117 | Substance quantification 143 | reporter gene assay evidence 180 | cell growth assay evidence 206 | high throughput nucleotide sequencing assay evidence 266 | Inferred from Experiment 377 | Western blot evidence 387 | Traceable Author Statement 403 | quantitative PCR 416 | Co-immunoprecipitation experiment 493 | experimental phenotypic evidence 713 | competitive growth assay evidence 715 | Substance quantification evidence 859 | Chromatin immunoprecipitation experiment 868 | Inferred from Sequence Model 870 | Flow cytometry data 874 | Inferred from Genetic Interaction 887 | Northern assay evidence 915 | Non-traceable Author Statement 1144 | Other 1438 | Enzyme assay data 1438 | expression microarray evidence 1548 | Reporter gene assay 1664 | gel electrophoresis evidence 1693 | Inferred by Curator 1896 | Inferred from Physical Interaction 2374 | Transcript expression level evidence 2431 | Western blot assay 2449 | Inferred from Sequence or Structural Similarity 4094 | Inferred from Electronic Annotation 5802 | Inferred from Mutant Phenotype 6844 | Inferred from Sequence Orthology 14855 | mass spectrometry evidence 22704 | Cell growth assay 27795 | Microscopy 28481 | experimental evidence 29938 | Inferred from Direct Assay

ValWood commented 7 years ago

duplicate of https://github.com/pombase/curation/issues/1385

mah11 commented 7 years ago

This is only a duplicate if you really think it's worth curators' time to go through and change case on evidence text in the bulk files (phafs etc.). I don't, so I'd be inclined to reopen it.

ValWood commented 7 years ago

I was thinking that when we went through the list we would provide Kim with a list of the case differences which need merging...

I guess tha task here is to provide a list to Kim...

ValWood commented 7 years ago

Oh I thought this was on the curation tracker. I wasn't paying attention when I followed the link.

kimrutherford commented 7 years ago

We already have a list of evidence codes with the correct capitalisation in the configuration file. The loaders should be able to check the correct capitalisation before storing in Chado. So no need to provide a list of case differences, I just need to change to code.

kimrutherford commented 6 years ago

We already have a list of evidence codes with the correct capitalisation in the configuration file

It's more complicated that I thought. The code should already normalise the cases of the evidence codes in Chado.

But we have some evidence codes configured twice, once with "evidence" at the end and once without. Eg. "Cell growth assay" versus "cell growth assay evidence"

Most are because we had an existing evidence code (like "Cell growth assay") then we added the ECO version "cell growth assay evidence". Some annotations in the flat files use one, some use the other.

Should I remove the non-ECO one? I'll make sure that if a file has an evidence code like "Cell growth assay" it checks to see if there is a legal evidence code with "evidence" at the end (like "cell growth assay evidence") and uses that instead.

I think that would remove a lot of the duplicates.

ValWood commented 6 years ago

I think we can remove the none-ECO one, but we would like the display labels to be the sorter version if easily possible (although it doesn't really matter)

kimrutherford commented 6 years ago

I think we can remove the none-ECO one

OK, I'll do that and see what breaks.

but we would like the display labels to be the sorter version if easily possible

Is that for the display in Canto or on the web site?

I noticed we have "Western blot assay" and "Western blot evidence". The second of those is the ECO name. Can I remove "Western blot assay"? I can try to fix the places it's used in the flat files.

ValWood commented 6 years ago

I have no pref which we use, but there should only be one ;)

in case we want to filter on evidence anywhere...

kimrutherford commented 6 years ago

OK, thanks. I'll go with the ECO one.

mah11 commented 6 years ago

I updated some entries in the legacy Chado yaml, which I hope will squash some log messages.

kimrutherford commented 6 years ago

I updated some entries in the legacy Chado yaml, which I hope will squash some log messages.

Thanks.

I think this is fixed now.