Closed neubig closed 1 year ago
Looks good. Two comments:
Hey @pfliu-nlp :
@neubig @pfliu-nlp Hey, thanks for proposing this.
First, library-specific environment variables involve the same problems as exposing global variables from the library. Since our client code should be initialized by hand, we can prepare __init__
arguments to pass these values:
# Users can use environment variables defined by themselves.
ExplainaboardClient(username=os.environ["EB_USER"], api_key=os.environ["EB_KEY"])
The second option:
from explainaboard_client.client import ExplainaboardClient
explainaboard_client.username = "xxx"
explainaboard_client.api_key = "yyy"
client = ExplainaboardClient()
client.evaluate_system(...)
is also exposing global variables.
Thanks @odashi ! Just to clarify, what's the main concern about doing it this way? Something related to security?
For reference, we followed the example of the OpenAI library, which does the following and seems like a relatively clean interface. Code below and here: https://github.com/openai/openai-python/blob/34a12097548ae07dbfe1363bb683fe98646aa723/openai/__init__.py#L26
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.Completion.create(
model="text-davinci-002",
prompt="",
temperature=0.7,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
@neubig The concern mostly comes from immutability: exposing global variables may be used in unintended order. The second example (The OpenAPI's one as you suggested) is still applicable. The first example (using pre-defined environment variables) should be avoided from the library because the underlying interaction through environment variables can't be controlled.
Hmm, thanks! I do feel that there's a balance between any potential adverse affects due to environmental and global variables and convenience though. Using a very limited number of environmental and global variables saves the cognitive load of having to declare them in various places in the code.
It also seems to be pretty common to use environmental variables in widely-used libraries for ML development (somewhat analogous to the explainaboard client here). For example:
From my point of view perhaps the gains in convenience would outweigh the risks? If there's a way to use a library in a similarly convenient way without the risks that'd be nice though.
@neubig Especially, you don't expect that Hf's interface is good: it is rather really bad as far as I know.
It is always recommended to use arguments for variables that are always modified. Environment variables involve many issues because the value is not controllable and it actually makes debugging much harder, so it needs to be avoided as well as possible. Only the cases that environment variables are feasible I think is:
main.py
(the most outer interface of the program; envvar is used only in this file)One reason that environment variable is used much time is that it is somewhat mentioned in the 12 Factor App guideline. This is designed to develop distributed webapps running on the closed environment, in which all environment is almost controllable by the developer. But Libraries are used basically on uncontrollable environments, and the design needs to become more conservative.
Note that using environment variables is not the only accepted design. For example, Tweepy (one of the widely used Twitter client) expects that all credentials are passed by arguments and there is no option to pass them as environment variables. I think this is more clean design.
OK sure, this isn't a hill I'm willing to die on, so we could make a change to something else.
Do you have a suggestion for an interface with minimal cognitive load that still fits your requirements?
ExplainaboardClient(username=os.environ["EB_USER"], api_key=os.environ["EB_KEY"])
is still a little bit high because the users needs to remember the names of the environmental variables that they defined every time they call the explainaboard client.
We probably also want to, at the very least, recommend a canonical naming of the environmental variables (e.g. EB_USERNAME
and EB_API_KEY
) to have consistency across multiple downstream programs that use the client, even if we don't enforce it explicitly. For example, if explainaboard user A uses open-source programs X, Y, and Z, it'd be nice if they didn't set a different environmental variable for each of the three programs.
My recommendation is "don't involve any reserved environment vaeiables in the library". CLI is allowed to use it
OK, that's a good guiding principle. I'll make the revisions to that effect (although perhaps not till tomorrow).
@odashi and @pfliu-nlp : just a quick reminder about this PR, thanks!
This makes the configuration for explainaboard_client global and stored in the base class. New usage would look like either:
or
Asking @pfliu-nlp for review, but also pinging @OscarWang114 in case it's useful for https://github.com/neulab/explainaboard_client/issues/18