lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

Bespoke code not transmitted to the worker #1182

Closed foster999 closed 11 months ago

foster999 commented 11 months ago

Sorry for raising yet another! This one might be down to my incorrect use.

Following from #1179, the error I get back from the workers says that my local package/code isn't transferred to the worker. As:

ModuleNotFoundError: No module named 'engineering'

Engineering is a locally developed package that is used in my sklearn pipeline.

I've tried configuring lithops to include dependencies using:

lithops:
    ...
    include_modules: [engineering, polars]

But in the debug logs I see:

2023-10-26 17:20:44,996 [DEBUG] serialize.py:90 -- Tentative modules to transmit: None
2023-10-26 17:20:44,996 [DEBUG] serialize.py:92 -- Include modules: polars, engineering
2023-10-26 17:20:44,996 [DEBUG] serialize.py:101 -- Modules to transmit: None

Suggesting that they are not transferred or installed on the workers.

Is this the right approach? Or would I need to create a custom container with the dependencies installed to do this?

JosepSampe commented 11 months ago

Did you try without setting any package in include_modules? I mean without setting include_modulesat all. By default Lithops includes all the modules it detects that are missing in the container.

In any case I will check if there is an issue with the include_modules config param.

In my experience, it is always a good idea to include all the required packages in the runtime itself, even if Lithops is able to transfer them. By including all the required packages, you will improve invocation and execution times.

foster999 commented 11 months ago

Yep, with the default I still see 2023-10-26 17:48:59,002 [DEBUG] serialize.py:101 -- Modules to transmit: None

That makes sense. Are there any examples of extending the default runtime that I could follow?

Edit: Ignore me, found the docs here

foster999 commented 11 months ago

I've got a custom runtime ready, but do you know how I can authorise IBM cloud functions to pull the docker container from a private container registry?

Edit: I've just spotted that IBM are depreciating cloud functions, so I'll look to change to code engine

JosepSampe commented 11 months ago

In #1199 I fixed the include_modules config parameter so that now it should always include all the modules set in it