oracle / graalpython

A Python 3 implementation built on GraalVM
Other
1.2k stars 104 forks source link

Requests library works when called from `graalpython`, but fails when called from java #244

Closed Galunid closed 2 years ago

Galunid commented 2 years ago
Dockerfile ``` FROM ghcr.io/graalvm/graalvm-ce:ol7-21.3 RUN yum install -y unzip patch git maven WORKDIR /app RUN gu install python RUN git clone 'https://github.com/paulvi/java-python-graalvm-template' WORKDIR /app/java-python-graalvm-template RUN mvn package CMD ["/opt/graalvm-ce-java17-21.3.0/bin/java", "-jar", "/app/java-python-graalvm-template/target/java-python-graalvm.jar"] ```
Java code ```java import org.graalvm.polyglot.Context; import org.graalvm.polyglot.Source; import java.io.*; import java.nio.file.Path; import java.nio.file.Paths; public class RunGraalPython3 { public static String PYTHON = "python"; private static String SOURCE_FILE_NAME = "health.py"; private static InputStream SOURCE_FILE_INPUT = RunGraalPython3.class.getClassLoader().getResourceAsStream(SOURCE_FILE_NAME); public static void log(String s){ System.out.println(s); } public static void main(String[] args) { log("Hello Java!"); log(System.getProperty("java.version")); log(System.getProperty("java.runtime.version")); String pyFilename = "./health.py"; Path path = Paths.get("venv", "bin", "graalpython"); if (path==null){ log("venv/ is not yet copied under target/classes/, run `mvn process-resources` or any next maven phase,"); } String VENV_EXECUTABLE = path.toString(); log(VENV_EXECUTABLE); try (Context context = Context.newBuilder("python"). allowAllAccess(true). option("python.ForceImportSite", "true"). option("python.Executable", VENV_EXECUTABLE). build();) { context.eval(PYTHON, "print('Hello Python!')"); context.eval(PYTHON, "import sys; print(sys.version)"); InputStreamReader reader = new InputStreamReader(SOURCE_FILE_INPUT); Source source; try { source = Source.newBuilder(PYTHON, reader, SOURCE_FILE_NAME).build(); } catch (IOException e) { throw new RuntimeException(e); } context.eval(source); } } } ```
Python code ```python #example 02 import requests from pprint import pprint THEGRAPH_URL = 'https://api.thegraph.com/index-node/graphql' graph = THEGRAPH_URL query = """ { indexingStatusForCurrentVersion(subgraphName: "org/example") { synced health fatalError { message block { number hash } handler } chains { network chainHeadBlock { number } latestBlock { number } } } } """ def run_query(query): request = requests.post(graph, json={'query': query}) if request.status_code == 200: return request.json() else: raise Exception("Query failed to run by returning code of {}. {}".format(request.status_code, query)) result = run_query(query) # Execute the query pprint(result) ```

Issue: No matter what website you try to connect to, requests.get fails with

Exception in thread "main" ConnectionError: <class 'MaxRetryError'>
    at org.graalvm.sdk/org.graalvm.polyglot.Context.eval(Context.java:379)
    at RunGraalPython3.main(RunGraalPython3.java:66)

when script is called from Java.

When you add more explicit logging to python like this

import traceback
import requests

try:
    res = requests.get("http://google.com")  # could be https://google.com, doesn't work either way
    print(res)
    print(res.text)
except requests.exceptions.ConnectionError as err:
    traceback.print_exc()
    print(err)  

the traceback looks like this:

traceback ``` Traceback (most recent call last): File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/connection.py", line 156, in _new_conn conn = connection.create_connection( File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/util/connection.py", line 84, in create_connection raise err File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/util/connection.py", line 68, in create_connection _set_socket_options(sock, socket_options) File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/util/connection.py", line 94, in _set_socket_options sock.setsockopt(*opt) OSError: [Errno 92] Protocol not available During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/connectionpool.py", line 665, in urlopen httplib_response = self._make_request( File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/connectionpool.py", line 387, in _make_request conn.request(method, url, **httplib_request_kw) File "/opt/graalvm-ce-java17-21.3.0/languages/python/lib-python/3/http/client.py", line 1255, in request self._send_request(method, url, body, headers, encode_chunked) File "/opt/graalvm-ce-java17-21.3.0/languages/python/lib-python/3/http/client.py", line 1301, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/opt/graalvm-ce-java17-21.3.0/languages/python/lib-python/3/http/client.py", line 1250, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/opt/graalvm-ce-java17-21.3.0/languages/python/lib-python/3/http/client.py", line 1010, in _send_output self.send(msg) File "/opt/graalvm-ce-java17-21.3.0/languages/python/lib-python/3/http/client.py", line 950, in send self.connect() File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/connection.py", line 184, in connect conn = self._new_conn() File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/connection.py", line 168, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno 92] Protocol not available During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/root/.local/lib/python3.8/site-packages/requests-2.22.0-py3.8.egg/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/connectionpool.py", line 719, in urlopen retries = retries.increment( File "/root/.local/lib/python3.8/site-packages/urllib3-1.25.6-py3.8.egg/urllib3/util/retry.py", line 436, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 92] Protocol not available')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/health.py", line 5, in res = requests.get("http://google.com") File "/root/.local/lib/python3.8/site-packages/requests-2.22.0-py3.8.egg/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/root/.local/lib/python3.8/site-packages/requests-2.22.0-py3.8.egg/requests/api.py", line 60, in request return session.request(method=method, url=url, **kwargs) File "/root/.local/lib/python3.8/site-packages/requests-2.22.0-py3.8.egg/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/root/.local/lib/python3.8/site-packages/requests-2.22.0-py3.8.egg/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/root/.local/lib/python3.8/site-packages/requests-2.22.0-py3.8.egg/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 92] Protocol not available')) HTTPConnectionPool(host='google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 92] Protocol not available')) ```

When the script is launched as graalpython <script> it works correctly. I get similar results when I run those scripts in a Manjaro Linux virtual machine.

msimacek commented 2 years ago

Hi, thank you for the report. This bug is already fixed and the fix will be part of the next release. You can get a nightly build from here. Note you may be asking why it worked in standalone graalpython and not when invoked via Java. The answer is that system interfaces like file IO or socket IO have multiple backend implementations in graalpython: 1) java backend which uses Truffle/JVM-provided interfaces for IO. It respects Truffle/JVM settings (like security restrictions or stdout redirection), but doesn't always behave exactly the same as CPython would. 2) native backend which uses system glibc APIs for IO, bypassing Truffle/JDK APIs. Its behavior is much closer to CPython, but it also bypasses any safeguards or settings of Truffle and the JVM.

The native backend is the default for invoking graalpython directly. java is the default when creating a context from Java. That's where the difference comes from. So you can also change the backend as a workaround to the problem. From Java, you can do that using .option("python.PosixModuleBackend", "native"). If you wan to try running standalone graalpython with the java backend, you can do that using --python.PosixModuleBackend=java CLI option.