Closed Freymaurer closed 6 months ago
So the solution is adding an accession number to every tag? Another issue is tags that are not identical but similar. E.g. there is Plant, plant and Plants. I think adding accession numbers could help also here. I had been planning to discuss this in our upcoming meeting.
(For me the same tags are shown for ER and normal tags, but I guess that has been fixed already.)
Another issue is tags that are not identical but similar. E.g. there is Plant, plant and Plants
I think this is a valid point. I am thinking about adding a qualitity control CI for pull requests which runs the code i used for my two issues today + a similiarity test for similiar words. Then before merging any PR one could see if these points are handled somewhat correctly.
What do you think about this? It would add another test to this:
Sounds good to me
I will start adding tag term accession numbers to the ambiguous terms from your check.
The first iteration of fixes went through, therefore i am going to update the current state here. Please note, that we now also test for similiar tags. If you find a combination to be a true difference (which can be very likely) please notify me below, so i can either increase the similiarity threshold or whitelist a specific combination. The current similiarity threshold is 0.8.
Edit: I will try to improve the script so the output is less split.
plant growth protocol
in:growth protocol
[0.812500] Growth chamber by (Dominik Brilhaus)growth protocol
[0.812500] MIAPPE observation unit and sample by (Hannah Dörpholz, Elisa Senger, Stella Eggels)growth protocol
[0.812500] RPTU - MBS, growth by (Frederik Sommer, Martin Kuhl, Oliver Maus)growth protocol
[0.812500] RPTU - MBS, growth TurboID by (Frederik Sommer, Martin Kuhl, Oliver Maus, David Zimmer)growth protocol
[0.812500] GEO - Minimal information plant growth by (Martin Kuhl)
BioImageArchive
in:BioImageArchive_Imaging
[0.848485] Imaging assay by (Christine Rempfer)
extraction
in:Extraction
[1.000000] DNA extraction by (Angela Kranz, Dominik Brilhaus)Extraction
[1.000000] Imaging extraction by (Chistine Rempfer)Extraction
[1.000000] Imaging computation by (Chistine Rempfer)Extraction
[1.000000] GEO - Minimal information RNA extraction by (Martin Kuhl)
RNA extraction protocol
in:extraction protocol
[0.900000] Metabolite Extraction by (Dominik Brilhaus, Martin Kuhl)extraction protocol
[0.900000] RPTU - MBS, cell disruption by (Frederik Sommer, Martin Kuhl, Oliver Maus)extraction protocol
[0.900000] RPTU - MBS, protein extraction by (Frederik Sommer, Martin Kuhl, Oliver Maus)extraction protocol
[0.900000] GEO - Minimal information RNA extraction by (Martin Kuhl)
extraction protocol
in:RNA extraction protocol
[0.900000] RNA extraction by (Hajira Jabeen, Dominik Brilhaus)
Extraction
in:extraction
[1.000000] RNA extraction by (Hajira Jabeen, Dominik Brilhaus)extraction
[1.000000] Protein extraction by (Oliver Maus, Dominik Brilhaus)
Assay
in:assay
[1.000000] Phenotyping protocol for Assay file MIAPPE by (Julie Jacquemin)assay
[1.000000] Sampling protocol for Assay file MIAPPE by (Julie Jacquemin)
Mass Spectrometry
in:Mass spectrometry
[1.000000] Proteomics MassSpec Assay by (Oliver Maus)Mass spectrometry
[1.000000] Data Processing (PRIDE minimal) by (Oliver Maus)Mass spectrometry
[1.000000] Measurement (PRIDE minimal) by (Oliver Maus)Mass spectrometry
[1.000000] Sample Preparation (PRIDE minimal) by (Oliver Maus)
observation unit
in:Observation Unit
[1.000000] MIAPPE observation unit and sample by (Hannah Dörpholz, Elisa Senger, Stella Eggels)
Measurement
in:measurement
[1.000000] MAdLand Nanodrop by (Fabian Haas)
data processing protocol
in:Data processing
[0.848485] Proteomics Computational Analyses by (Oliver Maus)
study
in:study
[0.888889] Aerial conditions protocol for Study file MIAPPE by (Julie Jacquemin)study
[0.888889] Characteristics for Study file MIAPPE by (Julie Jacquemin)study
[0.888889] Event protocol for Study file MIAPPE by (Julie Jacquemin)study
[0.888889] Growth protocol for Study file MIAPPE by (Julie Jacquemin)study
[0.888889] Nutrition protocol for Study file MIAPPE by (Julie Jacquemin)study
[0.888889] Rooting protocol for Study file MIAPPE by (Julie Jacquemin)study
[0.888889] Watering protocol for Study file MIAPPE by (Julie Jacquemin)
Data processing
in:data processing protocol
[0.848485] Metabolomics Computational Analysis by (Dominik Brilhaus, Oliver Maus, Martin Kuhl)data processing protocol
[0.848485] RPTU - MBS, data processing by (Frederik Sommer, Martin Kuhl, Oliver Maus)data processing protocol
[0.848485] GEO - Minimal information computational analysis by (Martin Kuhl)
Mass spectrometry
in:Mass Spectrometry
[1.000000] Metabolomics MassSpec Assay by (Dominik Brilhaus, Martin Kuhl)Mass Spectrometry
[1.000000] Metabolomics Computational Analysis by (Dominik Brilhaus, Oliver Maus, Martin Kuhl)Mass Spectrometry
[1.000000] MTH00029 by (Dominik Brilhaus)Mass Spectrometry
[1.000000] MPIMP - Fernie, mass spectrometry by (Micha Wijesingha Ahchige)
growth protocol
in:plant growth protocol
[0.812500] Plant growth by (Hajira Jabeen, Dominik Brilhaus, Oliver Maus, Martin Kuhl, Xiaoran Zhou)plant growth protocol
[0.812500] Study minimal MPIMP Fernie by (Micha Wijesingha Ahchige)
BioImageArchive_Imaging
in:BioImageArchive
[0.848485] Imaging extraction by (Chistine Rempfer)BioImageArchive
[0.848485] Imaging computation by (Chistine Rempfer)#
Observation Unit
in:observation unit
[1.000000] MIAPPE biological material by (Hannah Dörpholz, Elisa Senger, Stella Eggels)
measurement
in:Measurement
[1.000000] Proteomics MassSpec Assay by (Oliver Maus)Measurement
[1.000000] Measurement (PRIDE minimal) by (Oliver Maus)
phenotyping
in:phenotyping
[0.952381] Phenotyping protocol for Assay file MIAPPE by (Julie Jacquemin)
assay
in:Assay
[1.000000] Proteomics MassSpec Assay by (Oliver Maus)Assay
[1.000000] Genomics Assay by (Angela Kranz, Dominik Brilhaus)Assay
[1.000000] Imaging assay by (Christine Rempfer)Assay
[1.000000] Genome assembly by (Angela Kranz, Dominik Brilhaus, Oliver Maus)
phenotyping
in:phenotyping
[0.952381] Phenotyping protocol for Assay file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Sampling protocol for Assay file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Aerial conditions protocol for Study file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Characteristics for Study file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Event protocol for Study file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Growth protocol for Study file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Nutrition protocol for Study file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Rooting protocol for Study file MIAPPE by (Julie Jacquemin)phenotyping
[0.952381] Watering protocol for Study file MIAPPE by (Julie Jacquemin)
study
in:study
[0.888889] MIAPPE metadata by (Hannah Dörpholz, Elisa Senger, Stella Eggels)study
[0.888889] MIAPPE observation unit and sample by (Hannah Dörpholz, Elisa Senger, Stella Eggels)study
[0.888889] Study minimal MPIMP Fernie by (Micha Wijesingha Ahchige)
This image lead me to open this issue:
In this image you can see 3 different
PRIDE
tags. One asTag
, two asER_Tag
. One of the ER_Tags has an id the other has not.To clean up these things i ran some very simple analytics (results below). Would be nice if someone could clean this up 😄
Found ambiguous tag
growth
in:Found ambiguous tag
Plant
in:Found ambiguous tag
study
in:Found ambiguous tag
Proteomics
in:Found ambiguous tag
PRIDE
in:Found ambiguous tag
Transcriptomics
in:Code