iobis / pyobis

OBIS Python client
https://iobis.github.io/pyobis
MIT License
14 stars 10 forks source link

Add capability to process DNADerivedData to `occurrences` module #98

Open ayushanand18 opened 1 year ago

ayushanand18 commented 1 year ago

Overview

The pyobis package in its current state does not support fetching DNADerivedData using the occurrences module.

I believe there can be two ways to implement it:

7yl4r commented 1 year ago

DNADerivedData is a Darwin Core extension in the same way MeasurementOrFacts is, so I think they should be treated the same. I agree that occurrences.search is already too heavy.


Let's come at the problem from the other direction and think about the ideal user interface without technical considerations. This is what felt intuitive to me:

occurrences = pyobis.search.occurrences(
    scientificname='Mola mola',
    required_extensions=["MoF","DNA"]
)

However, we have already decided to separate the query-building from the query-executing, and I think that was good choice. Maybe we could aim something like this instead:

occurrences = pyobis.OccQuery(
    scientificname='Mola mola',
    required_extensions=["MoF","DNA"]
).execute()

This is nice because it positions the code to potentially generalize to handle other DwC providers as well:

occurrences = pyobis.OccQuery(
    scientificname='Mola mola',
    required_extensions=["MoF","DNA"]
).execute(
    data_providers=["OBIS", "GBIF"]
)

So, with this in mind, I am thinking we begin adding new features to the OccQuery class rather than into the search function. Assuming we implement DNA in this way and save the other changes for later this will look like:

occurrences = pyobis.OccQuery(
    required_extensions=["DNA"]
).search(
    scientificname='Mola mola',
    MeasurementOrFact=True
)

Thoughts?

ayushanand18 commented 1 year ago

I agree that the ways you listed are much intuitive and user-friendly than the current project.

But I had two questions to ask:

7yl4r commented 1 year ago

I think the object-oriented response to this is that we have multiple subclasses of OccQuery:

Both have shared attributes and methods under the parent class OccQuery


The object-oriented approach can be a bit pedantic, but it would work here. However, our focus should remain on keeping the intuitiveness of the user-interface first.. Is the object-oriented approach starting to over-complicate the UI?

Maybe a good compromise is to use OO programming under the hood and provide convenience methods to wrap that complexity for casual users. By this I mean little functions like this example:

# this function available as pyobis.occurrences.search()
define search(scientific_name):
    return pyobis.OccSearchQuery(
        scientificname=scientific_name
    ).execute()
ayushanand18 commented 1 year ago

This is interesting, building on your suggestion I think we might need to refactor how we handle functions and classes in each module.


I ran an experiment with pyobis.taxa and found that a better way to implement a functional plus OOPS approach would be to mix them, by letting the user play with functions on abstract level and play with classes and objects in the background.

Sample Usage

# intuitive way of doing things
from pyobis import taxa
query1 = taxa.search(scientificname="mola mola")
query1.execute()
# set query1.data to the fetched data from source, before this we only built the query not the response
query1.data
# show the data
query1.api_url
# show the api url
query1.mapper_url
# show the mapper url
query1.to_pandas()
# convert the data to a pandas dataframe

Under the hood (inside the module)

from ..obisutils import build_api_url, handle_arrstr, obis_baseurl, obis_GET
import pandas as pd

def search(scientificname=None, **kwargs):
    """
    Get taxon records.

    :param scientificname: [String,Array] One or more scientific names from the
        OBIS backbone. All included and synonym taxa are included in the search

    :return: A TaxaResponse object

    """

    scientificname = handle_arrstr(scientificname)
    url = obis_baseurl + "taxon/" + scientificname
    args = {"scientificname": scientificname}
    # return a taxa response class

    return TaxaResponse(url, args)

class TaxaResponse():
    """
    Taxa Response Class
    """
    def __init__(self, url, args):
        # public members
        self.data = None
        self.api_url = build_api_url(url, args)
        self.mapper_url = None

        # private members
        self.__args = args
        self.__url = url

    def execute(self, **kwargs):
        out = obis_GET(
            self.__url,
            self.__args,
            "application/json; charset=utf-8",
            **kwargs
            )
        self.data = out

    def to_pandas(self):
        return pd.DataFrame(self.data["results"])

in the obisutils

def build_api_url(url, args):
    return (
        url
        + "?"
        + urlencode({k: v for k, v in args.items() if v is not None})
    )

how does this sound?

7yl4r commented 1 year ago

I think this looks good, but I have a tendency to overengineer things. @ocefpaf (and|or) @MathewBiddle : what do you think? The object-oriented programming approach is familiar to me, and wrapping things with convenience methods for casual users seems reasonable to me too. But maybe there is a more elegant approach?

ocefpaf commented 1 year ago

I think this looks good, but I have a tendency to overengineer things. @ocefpaf (and|or) @MathewBiddle : what do you think?

Looks good to me. The user interface is not too complex and the internals are easy to follow. +1