pentaho-labs / pentaho-cpython-plugin

This is a PDI plugin that allows execution of Python code.
Apache License 2.0
32 stars 19 forks source link

CPython Script Executor not recognizing globally installed libraries on Ubuntu #36

Open AlefRP opened 1 year ago

AlefRP commented 1 year ago

Environment:

Issue:

I am using Pentaho Data Integration (PDI) version 9.2 on Ubuntu and trying to execute a CPython Script using the CPython Script Executor step. However, I am encountering an issue where the logs indicate that some libraries, such as pandas, scikit-learn, and matplotlib, are not installed. I have installed these libraries in the global Python environment on Ubuntu without using a virtual environment, as I couldn't find a way to configure the plugin to use a virtual environment.

The same setup works fine on Windows, but I am struggling to get it to work on Ubuntu.

Steps to reproduce:

  1. Install Pentaho Data Integration 9.2 on Ubuntu
  2. Install pandas, scikit-learn, and matplotlib in the global Python environment on Ubuntu
  3. Create a transformation with the CPython Script Executor step
  4. Configure the CPython Script Executor step to use a script that imports pandas, scikit-learn, and matplotlib
  5. Run the transformation

Expected behavior:

The CPython Script Executor step should be able to recognize the installed libraries in the global Python environment on Ubuntu and execute the script without issues.

Actual behavior:

The logs indicate that the required libraries (pandas, scikit-learn, and matplotlib) are not installed, even though they are installed in the global Python environment on Ubuntu.

Additional information:

I couldn't find any configuration options in the plugin to specify the Python environment or virtual environment. This issue does not occur when using the same setup on Windows.

Any help or guidance to resolve this issue would be appreciated.

grayver commented 3 months ago

I had the same issue, but then realized that I have 2 python versions installed (3.8 and 3.9). When I run pip install pandas globally - it installs the package for python3.9. But when I run python from the shell, it executes python3.8.