Closed ryabhmd closed 1 month ago
Do you need requests
for anything in the pipeline? My best guess is that you can simply ignore this error message.
You can also use one of my images: /netscratch/mostendorff/enroot/malteos_eulm_podman.sqsh
It has datatrove==0.2.0
installed.
Thanks! Your image works. :) However, now when I run the pipeline and it gets to the slurm execution part to launch a job from within the script, I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'sbatch' srun: error: serv-3317: task 0: Exited with exit code 1
I tried to look at similar issues (e.g. this one) but they didn't solve the issue.
Any ideas?
Slurm commands are not available within a containerized compute job. See https://github.com/scilons/datatrove/blob/main/src/datatrove/executor/slurm.py#L35
You need to start the Slurm pipeline from a login node or rewrite it to use a local execution pipeline.
The local pipeline works fine and can be ran with an interactive job. I'm wondering if we want to use the slurm, should I create the environment directly in the login machine and run it there?
I've installed mamba and set up a virtual environment, then I ran it from there. I'm closing this, feel free to let me know if you have further questions
To run scilons_pipeline.py, I've been trying to build an image on slurm and install the datatrove[all] package (as per the instructions in the README). I've tried to re-use several images from /netscratch/enroot (e.g. python+3.10.4-bullseye.sqsh, ubuntu20+conda.sqsh) and then installing the packages but always end up with incompatibility issues in the installed packages, which results in not being able to use the image.
E.g. when I build on ubuntu20+conda.sqsh and install the datatrove library I get:
However, the required version of requests in incompatible with the datasets package. Once I save the image and use it to run the code it cannot find any of the modules.
Any ideas on how to build an image to run the code? Maybe another I need to use another image to install the package in?