Standardize interface for pulling data

Background

Since we care about dynamic datasets, we need a way of pulling in changes. With Gmail being the only source, it's not so bad, but we should experiment with the required interface before we go to add web scraping.

Proposal

Standardize the data retrieval from a source behind an interface class.

import abc
class  Sourcerer(abc.ABC):
    """Sourcerer's can pull data from a data source."""
    #: A one-word descriptive name for this source of data.
    name = None

    @abc.abstractmethod
    def pull(df=None, **kwargs):
        """Retrieve updates from the data source.

        Args:
            df (:class:`pandas.DataFrame`): DataFrame to use as a base of reference. 
                Not providing one indicates a full copy from the source is to be performed, if possible.
        Returns:
            A new :class:`pandas.DataFrame`, of only this source's type.
        """
        pass

Currently, only the Gmail source exists, so implementation will be easy.

class GmailSource:
    name = 'gmail'

    def __init__(self, gmail=None):
        self.gmail = gmail or gmail.Gmail()  # Run initialization.

    def pull(df, only_newer=True, **kwargs):
        # Do a diff against the data frame
        # Grab new emails
        raise NotImplementedError

Usage:

>>> gs = GmailSourcer()
>>> print(gs.name)
'gmail'
>>> gmail_df = gs.pull(df)
>>> df.merge([gmail_df])
>>> # Voila!

jad-b / matchwell

Standardize interface for pulling data #1

Background

Proposal