tgbugs / ontquery

a framework querying ontology terms
MIT License
3 stars 3 forks source link

QueryResult type #5

Closed bendichter closed 6 years ago

bendichter commented 6 years ago

I noticed that a query returns None when there is more than one result and aQueryResult object, not an OntTerm object when there is one result. Is there a reason to distinguish between an OntTerm lookup and a query search? Sorry if you've gone over this previously. Would it be possible for OntTerm to simply take a QueryResult object as an input argument for the constructor?

tgbugs commented 6 years ago

Short version. You can call OntTerm(**result) but we need to validate at some point, and for that reason OntQuery.__call__ has been written primarily to be called inside OntTerm.__new__. The other place OntQuery.__call__ is called is directly by the user in the repl. Having OntQuery.__call__ return values in the repl case makes the return repr and obsure the printed output which the user can review and paste in directly (users have to look at results). I may be overloading functionality on OntQuery and could dissociate it into two different classes, one for serving OntTerm and one

Longer version.

My current workflow has informed some of the design decisions and is definitely open to change/extension/adaption to other workflows.

  1. I use OntTerm(term='thing') directly to insert an ontology identifier into a python file.
  2. I then run the file and get all the errors which have the different choices that I can choose from.
  3. I then paste the term I want into my code to replace the underspecified term. See for example https://github.com/tgbugs/pyontutils/blob/master/pyontutils/methods.py#L165 as the endpoint of this workflow.

In a variant of this workflow I do not use query('thing') directly when I want to enter a term into a python file. I use query('thing') for doing exploratory work, and then will paste a result in to the code. Because copy/paste errors are a major problem when working with ontology terms, I want the thing that I paste in (i.e. OntTerm) to be self validating/verifying.

Let me discuss the fundamental issue that OntTerm is trying to solve, which I think will help clarify the other reasons for the behavior you are observing.

In order for ontology integration to be useful, validation needs to happen at some point, and OntTerm seems like the natural place to do it for both usability and practical reasons. Therefore query has to be called after args have been passed to __new__. We could choose not to do this, but then we would have to have a ValidatedOntTerm class, which seems redundant to me.

OntTerm can indeed take a QueryResult as a constructor OntTerm(**result). However, if OntTerm cannot query to validate those arguments, then it will mark the term as unverified. For example I wrote OntTerm('CHEBI:33697', label='RNA') manually, and it turns out (to my horror) that we don't actually have that term in the ontology at the moment. For a usability standpoint, I find it extremely useful that OntTerm tells me this.

The fact that the primary usage for query is inside of the OntTerm constructor is the reason why it returns None. When query is called from OntTerm that return of None becomes a ValueError which I find is a good way to communicate to the user that they need to refine their query or pick an identifier for their term. This approach is totally open to change. I just find that it works for me.

I'm still working out the best way to connect QueryResult to OntTerm. I am not at all happy with the current implementation (e.g. there are a number of bad design decisions that are driven by issues I have encountered with classes derived from OntTerm.)

bendichter commented 6 years ago

I'm not a huge fan of relying on a print output for the workflow and not having the results in the output because it seems like it would prevent some type of script built on top of these tools. Would it be possible for an underspecified query to return a list of QueryResults and have the list type trigger a ValueError in OntTerm?

tgbugs commented 6 years ago

Yes. What I will do is have the base OntQuery class yield all query results and have OntTerm raise the error. I'll break out the repl workflow into its own child class that suppresses output.