obophenotype / cell-ontology

An ontology of cell types
https://obophenotype.github.io/cell-ontology/
Creative Commons Attribution 4.0 International
142 stars 49 forks source link

[Typo/Bug] Multiple labels for cell and neuron #841

Open zoependlington opened 3 years ago

zoependlington commented 3 years ago

CL term CL_0000000 (cell) CL_0000540 (neuron)

Description of typo, bug or error

As described in https://github.com/EBISPOT/efo/issues/871, both terms have multiple labels - one with a language tag and one without.

Discussed with @paolaroncaglia

addiehl commented 3 years ago

Oddly enough, CL:0000236 'B cell' also has two label annotations. One has type xsd:string CL:0000000 'cell' actually has four label annotations. One has type xsd:string, and two have language: en.

There may well be other terms with multiple labels. Finding 'B cell' was a bit random.

I think the correct approach for the moment is to have no type designation and no language designation.

paolaroncaglia commented 3 years ago

@addiehl thanks.

According to the current CL guidelines, there should be strictly 1 rdfs:label per term. Questions:

Worth running a check and listing all CL terms that have more than 1 rdfs:label? And then removing extra labels?

That should, in principle, sort out the issue of duplicated/multiple labels in ontologies that import from CL, such as EFO. However, it's not entirely clear to @zoependlington and myself why e.g. CL:0000236 'B cell', that has 2 labels in CL and is imported dynamycally into EFO, only has 1 label in EFO. Anyway, could others please comment on the suggested check, and confirm on having no type designation and no language designation as a default rule for CL?

Thanks, Paola

addiehl commented 3 years ago

It's possible that the EFO importing script ignores the label with type xsd:string, or combines it with the plain label.

I suspect these multiple labels are technical artifacts. I agree a more systematic approach to finding all duplicate labels is needed. Not my skill set, sadly.