getEnvironmentVariable utility always return undefined on Cloudflare Workers

davkorss commented 5 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain.js documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain.js rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/utils/env.ts#L77

export function getEnvironmentVariable(name: string): string | undefined {
  // Certain Deno setups will throw an error if you try to access environment variables
  // https://github.com/langchain-ai/langchainjs/issues/1412
  try {
    return typeof process !== "undefined"
      ? // eslint-disable-next-line no-process-env
        process.env?.[name]
      : undefined;
  } catch (e) {
    console.error(e);
    return undefined;
  }
}

Error Message and Stack Trace (if applicable)

No response

Description

Greetings!

In Cloudflare Workers, the use of the API process requires to be imported, and the compatibility flag “nodejs_compat” must be used.

https://developers.cloudflare.com/workers/runtime-apis/nodejs/process/

However, in the utilities the “node:process” module is not imported, so referencing the API process generates an error and undefined is returned from the catch. This problem doesn't allow LangSmith to be used from Cloudflare Workers. I've been trying several solutions since I don't think it's a good idea to solve the problem by importing the “node:process” module, since that would force everyone to add the compatibility flag “nodejs_compat” in their workers. Maybe the model constructor could receive some tracing props, in the same way as the verbose property, which can also be obtained from environment variables, something like that:

const llm = new ChatOpenAI({
  tracing: true,
  tracingProjectName: this.env.LANGCHAIN_PROJECT,
  tracingApiKey: this.env.LANGCHAIN_API_KEY,
  tracingApiUrl: this.env.LANGCHAIN_ENDPOINT,
  streaming: false,
  openAIApiKey: this.env.OPENAI_API_KEY,
  modelName: model,
  callbacks: [callbackHandlers],
  verbose: true,
  temperature: 0,
  configuration: {
    baseURL: this.env.CLOUDFLARE_AI_GATEWAY_OPENAI_ENDPOINT,
  },
});

If you have any idea how we can solve this, I can create the feature!

System Info

platform Cloudflare Workers

dosubot[bot] commented 5 months ago

To address the issue with getEnvironmentVariable in Cloudflare Workers without requiring the nodejs_compat flag, you can adapt the utility to leverage the globalThis object for environment variable access. This approach avoids the need for Node.js compatibility and directly utilizes Cloudflare Workers' global scope. Here's how you can modify the function:

export function getEnvironmentVariable(name: string): string | undefined {
  try {
    // Attempt to access the environment variable from the global scope
    if (typeof globalThis[name] !== "undefined") {
      return globalThis[name];
    } else if (typeof process !== "undefined") {
      // Fallback for Node.js environments
      return process.env?.[name];
    } else {
      return undefined;
    }
  } catch (e) {
    return undefined;
  }
}

This code first attempts to retrieve the environment variable from globalThis, suitable for Cloudflare Workers. If the variable isn't found and the environment is Node.js (where process is defined), it falls back to using process.env. This ensures compatibility across different environments without special flags.

Regarding your suggestion to modify the model constructor for tracing properties, it's indeed a viable approach for passing configuration directly, enhancing flexibility and security by decoupling API keys and other sensitive information from your codebase. Implementing such a feature would involve extending the constructor's parameters to accept these additional properties, ensuring they are properly handled within the library.

Before deploying these changes, ensure thorough testing in both Cloudflare Workers and Node.js environments to confirm that the modified getEnvironmentVariable function works as expected and that any changes to the constructor do not introduce regressions.

#### Details

**Sources** - [langchain-core/src/utils/env.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/utils/env.ts) - [environment_tests/test-exports-tsc/main.ts](https://github.com/langchain-ai/langchainjs/blob/main/environment_tests/test-exports-tsc/main.ts) - [libs/langchain-cloudflare/package.json](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-cloudflare/package.json) - [libs/langchain-cloudflare/src/llms.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-cloudflare/src/llms.ts) - [libs/langchain-community/src/llms/cloudflare_workersai.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/llms/cloudflare_workersai.ts) - [libs/langchain-cloudflare/src/chat_models.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-cloudflare/src/chat_models.ts) - [examples/src/agents/chat_convo_with_tracing.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/agents/chat_convo_with_tracing.ts) - [examples/src/models/chat/integration_openai_custom_base.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/models/chat/integration_openai_custom_base.ts) - [examples/src/models/chat/integration_openai.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/models/chat/integration_openai.ts) - [CONTRIBUTING.md](https://github.com/langchain-ai/langchainjs/blob/main/CONTRIBUTING.md) - [.github/workflows/unit-tests-langchain.yml](https://github.com/langchain-ai/langchainjs/blob/main/.github/workflows/unit-tests-langchain.yml)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

jacoblee93 commented 4 months ago

Hmm yeah that's not great - as a stopgap you can manually pass in a tracer like this:

https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain#trace-selectively

Will have a think to see how to do this more seamlessly CC @craigsdennis

jacoblee93 commented 4 months ago

The other thing you could try (suggested by @efriis) is to manually set process.env from the passed in environment:

https://developers.cloudflare.com/workers/runtime-apis/nodejs/process/#processenv

davkorss commented 4 months ago

I understand that manually defining process.env will not change the fact that the getEnvironmentVariable utility will continue to return undefined from the catch, since in Cloudflare Workers the use of the process API must be explicitly imported import * as process from 'node:process'; as shown in the docs. I will try to manually pass the tracer, thank you very much for your time!

jacoblee93 commented 4 months ago

Would this work?

import * as process from 'node:process';
process.env = env;
// tracing code

davkorss commented 4 months ago

Implementing that piece of code in the utility https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/utils/env.ts would solve the problem. However, if I implement it myself in my project, it wouldn't solve anything, since using the getEnvironmentVariable function inside langchain, it would still have the same wrong behavior returning always undefined. In short, it doesn't matter if I define process.env in my project, since langchain internally won't be able to retrieve its values when running in Cloudflare Workers.

jeasonnow commented 4 months ago

To address this issue, we cannot simply adjust getEnvironmentVariable. Instead, we need to follow these steps:

Inject external environment variables into the global configuration.
When getEnvironmentVariable retrieves environment variables, it should obtain them from both the external environment variables and process.env, according to priority. This approach might help resolve the issue. I expect that the usage could be like this in the future:
```
const llm = new ChatOpenAI({
…otherOptions,
configuration: {
baseURL: this.env.CLOUDFLARE_AI_GATEWAY_OPENAI_ENDPOINT,
externalEnvs: env, // env is the environment variables injected into their runtime environment by other platforms (cloudflare, vercel)
},
});
```

(The externalEnvs parameter is one that I randomly named, but if possible, it could be replaced with a better name.)

davkorss commented 4 months ago

I like it, it looks much cleaner than what I suggested before, thank you very much!

langchain-ai / langchainjs