Closed kshefchek closed 5 years ago
It is live, see here http://compbio.charite.de/jenkins/job/hpo.annotations.2018/ phenotype.hpoa
Note that there is also an improved parser for this format in phenol
It would be great to reuse phenol here, is it too much to ask for the parser to output the rdf model needed for our pipeline. I could send over a sample that approximates the modeling, but it might not include every edge case. The full file is here https://data.monarchinitiative.org/ttl/hpoa.ttl cc @yy20716
There is a lot of framework in place for doing this in ontobio, but the advantage of doing in phenol is that it keeps the RDF modeling well-coupled to the reference java object model.
@yy20716 @cmungall Is this something we want to do within phenol? Should there be a module of phenol that would use Jena (or something like that) as an adapter? Or should this be a separate app? It should not be hard to do. Note also @kshefchek that we now have additional infos with the improved annotation format and so we should also update the schema
I think a separate module makes sense. This could be phenol-rdf in the main phenol repo - or an entirely separate library. There may be some advantages in having it more closely integrated. For example, it would be possible to synchronize the in-memory associations with a jena model and then do useful SPARQL queries. It might also be useful to have a little SPARQL form embedded in tools like PhenoteFX for power users.
@kshefchek Sorry for a late response. I plan to extend phenol's io package as requested, so that it reads phenotype.hpoa file and produces hpoa.ttl (like the one in https://data.monarchinitiative.org/ttl). The problem is that it's still not clear for me how the internal graph in phenol should be mapped to the hpoa.ttl. I asked this question in Gitter and Tom suggested me to check
but I guess that the problem maybe I am not very familiar with Dipper's internal and overall flow. In readme.MD, I see that the link for "best-practices documentation for details on writing new Source parsers using Dipper code ..." but it seems that the link is dead. If you don't mind, can you please point out some documents that I can check for this task? If possible, sharing any sample files would be helpful as well. Thank you.
To be clear Matt's s ingest artifacts repo is https://github.com/monarch-initiative/ingest-artifacts and contains concept maps of ontological intent.
The graphviz reports generated by me at https://data.monarchinitiative.org/*/dot/*gv
are derived from the dipper RDF output and represent a slice of observed reality.
Tom helped me to understand an overall flow of Dipper. After then, I've been checking the hpo annotation file format manuals and HPOAnnotations.py for this issue. I have some doubts and would greatly appreciate it if you could clarify them.
Indirect conversion, i.e. we can add a middle layer so that the instances are first fit into Phenol's internal graph model. We then build another transformer that converts the internal graph into the a set of triples that follows OBAN's model (and return them). In addition to this, we can also add a graph store that stores the set of triples so that later other people can query over them.
So I guess we want to follow the second option, but I would like to ask your opinions (@pnrobinson @cmungall) how the HpoDisease instance should be mapped into Phenol's internal graph model. I also would like to ask whether there are any java libraries that can be used to produce triples based on OBAN's model, like the set of functions under Dipper/Model.
When I see HPO annotation file formats, there are four possible values for aspects but only M's descriptions are different, i.e. M means Mortality/Aging in the previous version while M means Clinical Modifier subontology in the current version. When I see Dipper's codes, each aspect is currently differently mapped, so I wonder these M are different ones. @pnrobinson, @TomConlin, @kshefchek, if they are different, can I ask your opinions how the new M needs to be handled (or mapped into rdf graphs)?
I also would like to know how to map newly added field 'Sex'. It seems that phenol currently does not have codes that handle this field in HpoDiseaseAnnotationParser.java. Is it okay to ignore this field for now?
Thank you.
@yy20716 afaik the Dipper code is out of date with respect to the new annotation model. It would be good to make things consistent across Monarch, and so maybe we can meet with the Dipper team and go through the new data model. We will need to implement some additional items in the Java code to take the improvements of the recent switch into account, so unfortunately, even the Java code is out of date...
When the new HPO file format goes live we will need to update our parser.