Global configuration for explainaboard client

neubig commented 2 years ago

This makes the configuration for explainaboard_client global and stored in the base class. New usage would look like either:

$ export EB_USERNAME=xxx
$ export EB_API_KEY=yyy
$ python
>>> from explainaboard_client.client import ExplainaboardClient
>>> client = ExplainaboardClient()
>>> client.evaluate_system(...)

or

$ python
>>> from explainaboard_client.client import ExplainaboardClient
>>> explainaboard_client.username = "xxx"
>>> explainaboard_client.api_key = "yyy"
>>> client = ExplainaboardClient()
>>> client.evaluate_system(...)

Asking @pfliu-nlp for review, but also pinging @OscarWang114 in case it's useful for https://github.com/neulab/explainaboard_client/issues/18

pfliu-nlp commented 2 years ago

Looks good. Two comments:

Have you seen the two comments in this PR: https://github.com/neulab/explainaboard_client/pull/39
is the user_name (email) necessary? Is it the possibility to remove it in the future?

neubig commented 2 years ago

Hey @pfliu-nlp :

I didn't see any and I actually don't see them now. Did you make comments?
Maybe not, we could remove it later. But I also don't think it's a huge problem, the amount of extra work caused for users by including it is pretty minimal.

odashi commented 2 years ago

@neubig @pfliu-nlp Hey, thanks for proposing this.

First, library-specific environment variables involve the same problems as exposing global variables from the library. Since our client code should be initialized by hand, we can prepare __init__ arguments to pass these values:

# Users can use environment variables defined by themselves.
ExplainaboardClient(username=os.environ["EB_USER"], api_key=os.environ["EB_KEY"])

odashi commented 2 years ago

The second option:

from explainaboard_client.client import ExplainaboardClient
explainaboard_client.username = "xxx"
explainaboard_client.api_key = "yyy"
client = ExplainaboardClient()
client.evaluate_system(...)

is also exposing global variables.

neubig commented 2 years ago

Thanks @odashi ! Just to clarify, what's the main concern about doing it this way? Something related to security?

For reference, we followed the example of the OpenAI library, which does the following and seems like a relatively clean interface. Code below and here: https://github.com/openai/openai-python/blob/34a12097548ae07dbfe1363bb683fe98646aa723/openai/__init__.py#L26

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="",
  temperature=0.7,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

odashi commented 2 years ago

@neubig The concern mostly comes from immutability: exposing global variables may be used in unintended order. The second example (The OpenAPI's one as you suggested) is still applicable. The first example (using pre-defined environment variables) should be avoided from the library because the underlying interaction through environment variables can't be controlled.

neubig commented 2 years ago

Hmm, thanks! I do feel that there's a balance between any potential adverse affects due to environmental and global variables and convenience though. Using a very limited number of environmental and global variables saves the cognitive load of having to declare them in various places in the code.

It also seems to be pretty common to use environmental variables in widely-used libraries for ML development (somewhat analogous to the explainaboard client here). For example:

From my point of view perhaps the gains in convenience would outweigh the risks? If there's a way to use a library in a similarly convenient way without the risks that'd be nice though.

odashi commented 2 years ago

@neubig Especially, you don't expect that Hf's interface is good: it is rather really bad as far as I know.

It is always recommended to use arguments for variables that are always modified. Environment variables involve many issues because the value is not controllable and it actually makes debugging much harder, so it needs to be avoided as well as possible. Only the cases that environment variables are feasible I think is:

In the main.py (the most outer interface of the program; envvar is used only in this file)
settings that are not intended to be controlled frequently (e.g., logging level of a specific library)

odashi commented 2 years ago

One reason that environment variable is used much time is that it is somewhat mentioned in the 12 Factor App guideline. This is designed to develop distributed webapps running on the closed environment, in which all environment is almost controllable by the developer. But Libraries are used basically on uncontrollable environments, and the design needs to become more conservative.

odashi commented 2 years ago

Note that using environment variables is not the only accepted design. For example, Tweepy (one of the widely used Twitter client) expects that all credentials are passed by arguments and there is no option to pass them as environment variables. I think this is more clean design.

https://github.com/tweepy/tweepy/blob/33e444a9d13d53ea024ddb3c9da30158a39ea4f6/tweepy/client.py#L205-L214

neubig commented 2 years ago

OK sure, this isn't a hill I'm willing to die on, so we could make a change to something else.

Do you have a suggestion for an interface with minimal cognitive load that still fits your requirements?

ExplainaboardClient(username=os.environ["EB_USER"], api_key=os.environ["EB_KEY"])

is still a little bit high because the users needs to remember the names of the environmental variables that they defined every time they call the explainaboard client.

We probably also want to, at the very least, recommend a canonical naming of the environmental variables (e.g. EB_USERNAME and EB_API_KEY) to have consistency across multiple downstream programs that use the client, even if we don't enforce it explicitly. For example, if explainaboard user A uses open-source programs X, Y, and Z, it'd be nice if they didn't set a different environmental variable for each of the three programs.

odashi commented 2 years ago

My recommendation is "don't involve any reserved environment vaeiables in the library". CLI is allowed to use it

neubig commented 2 years ago

OK, that's a good guiding principle. I'll make the revisions to that effect (although perhaps not till tomorrow).

neubig commented 1 year ago

@odashi and @pfliu-nlp : just a quick reminder about this PR, thanks!

neulab / explainaboard_client

Global configuration for explainaboard client #40