Closed Freymaurer closed 1 year ago
This is a preview of the terms from highest to lowest:
and remove their existing is_a relationships.
what are those "is_a"?
Sounds good to me, if that helps to speed up search. Organisms not listed in those 20.000 top hits could still be added by hand or to your ncbitaxonmin.obo. Just be careful with ontology version. Not sure how frequent, but NCBITaxon is supposedly changing once a species is moved to a new taxum. So the approach needs to be update, reproduced upon NCBItaxon changes.
The main issue I see with this is that while you take the top entries, you are not filtering for plant organisms (as I understood it). Since those would be the most important for DataPLANT I don't think this approach is the best. If there is a way to filter for plants that would be great. Also, for the microbiologists that are using Swate, microorganisms might also be important. However, I also see entries for viruses and several other animals in there that (in my opinion) are not relevant for DataPLANT
If you have a look here, it might be worth filtering for some of these branches, definitely for Viridiplantae. I am however no taxonomist, so there might be important organisms in other branches as well that we should be including.
Sounds good to me, if that helps to speed up search. Organisms not listed in those 20.000 top hits could still be added by hand or to your ncbitaxonmin.obo.
Correct!
Just be careful with ontology version. Not sure how frequent, but NCBITaxon is supposedly changing once a species is moved to a new taxum. So the approach needs to be update, reproduced upon NCBItaxon changes.
I can add the scripts i used to the repo
The main issue I see with this is that while you take the top entries, you are not filtering for plant organisms (as I understood it). Since those would be the most important for DataPLANT I don't think this approach is the best. If there is a way to filter for plants that would be great. Also, for the microbiologists that are using Swate, microorganisms might also be important. However, I also see entries for viruses and several other animals in there that (in my opinion) are not relevant for DataPLANT
Is this information we can get from ncbitaxon, maybe by doing step 2 of my workflow against a subset of the ncbitaxon ontology which are children of plant and microorganisms ? Is this something you can tell me?
Is this information we can get from ncbitaxon, maybe by doing step 2 of my workflow against a subset of the ncbitaxon ontology which are children of plant and microorganisms ? Is this something you can tell me?
I have attached an image above, hopefully it was sent correctly. My internet is a bit unstable right now. For bacteria I'm not sure what should be included. Sabrina or Angela probably know more about this
In addition to plants and bacteria, you might also want to include algae (including Crytpophyceae, Rhodophyta, Glaucocystophyceae) and fungi (Opisthokonta > Fungi). Maybe it is easier and sufficient to exclude animals (Opisthokonta > Metazoa)?
This approach just threw out my favorite species "Talinum fruticosum" that I always use in trainings :(
Don't Panic! I'll add it for you manually.
i just checked and it seems to have worked out fine! I Can already find Talinum fruticosum
in the search again.
Yes, I've added it to the ncbitaxon.min_plus.obo file you made Kevin. Unfortunately, it's without the full annotation as Protege keeps crashing when I try and open the ncbitaxon file to import the entire term, but it's at least there to be used for annotating metadata sheets. I'll try and sort that out when I have a spare minute.
Thanks
Ontology
ncbitaxon
Please state the reason to import this ontology into the SwateDB
We currently feature the full ncbitaxon ontology which results in greater difficulties designing performant parent-child search functions. I therefore propose replacing ncbitaxon with a light/min version. For an ontology of this sort I suggest the names:
I am currently creating this ontology following the following lines of thought:
has_rank NCBITaxon:species
is_a
relationships. Then i add oneis_a
relationship to Organism.Let me know what you think of this approach! Best case i would get some quick replies to finish this tomorrow.
@muehlhaus @Brilator @AngelaKranz @Hannah-Doerpholz @kdumschott @StellaEggels