ploomber / soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
https://soopervisor.readthedocs.io
Apache License 2.0
45 stars 18 forks source link

multiple requirements.txt files #86

Closed neelasha23 closed 2 years ago

neelasha23 commented 2 years ago

Describe your changes

The changes will allow users to declare multiple task specific requirements..lock.txt / environment..lock.txt files. Multiple images are being built with different dependency files for different task patterns like fit-__, etc. Scripts which do not need separate requirements file will use the image corresponding to the default requirements.lock.txt file. As of now scheduling jobs as per this change will occur only in the AWS batch exporter.

Issue ticket number and link

Closes #77

Checklist before requesting a review

neelasha23 commented 2 years ago

I see this error in the Windows CI:

Checking out the ref
  "C:\Program Files\Git\bin\git.exe" checkout --progress --force refs/remotes/pull/86/merge
  Error: error: invalid path 'tests/assets/multiple_requirements_project/requirements.clean-*.lock.txt'
  Error: The process 'C:\Program Files\Git\bin\git.exe' failed with exit code 128

The repo checkout itself is failing because of this file tests/assets/multiple_requirements_project/requirements.clean-*.lock.txt. Probably because we can't have * in filename on Windows?

idomic commented 2 years ago

@neelasha23 Maybe we can try double underscore as well for windows? __

neelasha23 commented 2 years ago

It might work for the requirement files names but __ will not work for docker images. That's why we had decided to use ploomber instead of any wildcard character ( for image names )

neelasha23 commented 2 years ago

There is an issue that I'm facing with the airflow tests. The tests/airflow/test_airflow_export.py cases kept failing because the mock was expecting pkg-name.json file inside env-name folder for non bash presets (I'm not sure why this happened because I have hardly changed anything in the conftest). So just to see if the test case is passing I changed this line

https://github.com/ploomber/soopervisor/blob/81d0bda8879980b86c75aafcd4d68facf2130423/src/soopervisor/airflow/export.py#L98

Although this fixed the airflow tests , now tests/test_export fails which is the current CI error : FAILED tests/test_export.py::test_checks_the_right_spec_pkg[AirflowExporter] - FileNotFoundError: [Errno 2] No such file or directory: 'serve/my_project.json' .

I would need some help in understanding this functionality.

idomic commented 2 years ago

@neelasha23 this code takes the final Airflow dag and saving it into the file the users can deploy with. We're opening and writing into the json file, I don't think removing this line will suffice since the whole test is checking this export functionality. @edublancas thoughts?

neelasha23 commented 2 years ago

Just to clarify, I haven't removed that line. Previously it was writing to pkgname.json I have only modified it to write to envname/ pkgname.json, because that's what the test case is suddenly expecting. I'm not sure how this happened because I haven't made any major changes in the airflow pipeline.

edublancas commented 2 years ago

ok, I found the issue. I pushed a commit changing this line:

https://github.com/ploomber/soopervisor/blob/81d0bda8879980b86c75aafcd4d68facf2130423/src/soopervisor/airflow/export.py#L98

to:

path_dag_dict_out = Path(pkg_name + '.json')

since that's what I saw in the master branch.

then I saw a bunch of tests fail.

the problem is here:

https://github.com/ploomber/soopervisor/blob/81d0bda8879980b86c75aafcd4d68facf2130423/src/soopervisor/commons/docker.py#L245

this line will behave differently depending on what happens with the for loop. If lock_paths is empty, it will never execute, if there are 1,2,3 items, it'll execute that number of times. This will cause the function to end up in a different working directory, which is causing the issue. These internal functions should be deterministic, Ideally, they should not change the working directory. and if they do, they should always revert to the original working directory where they were executed

neelasha23 commented 2 years ago

Got it. The control needs to go back to the parent directory of dist before start of every loop otherwise it fails. I will try to think of a way around. What about the the issue with windows? Doesn't look like filenames can have * in them on Windows. So should I change the naming convention of the requirements.task-.lock.txt files? Maybe have __ like Ido suggested? For images we had decided to have ploomber instead of ``.

edublancas commented 2 years ago

re windows:

yeah, I think double underscore makes sense, we already have something similar for the SLURM exporter: https://github.com/ploomber/soopervisor/blob/566d7773a3af408046d0efbdbd80ad10f77bb620/src/soopervisor/shell/export.py#L33

neelasha23 commented 2 years ago

The CI is passing now for all the platforms. I have also submitted a pipeline to AWS batch and verified that the tasks are getting scheduled in the corresponding image. Had to use args = ['docker', 'buildx', 'build', '--platform=linux/amd64', '.', '--tag', image_local] for the jobs to run successfully.

idomic commented 2 years ago

@neelasha23 I ran it and looked on the code build for this user: 95de7987, with the sample pipeline - multiple_requirements_project. Where can I see the 2 images creation? I didn't see it in the build. Also, I saw the batch tasks are failing (sent you the links).

neelasha23 commented 2 years ago

@neelasha23 I ran it and looked on the code build for this user: 95de7987, with the sample pipeline - multiple_requirements_project. Where can I see the 2 images creation? I didn't see it in the build. Also, I saw the batch tasks are failing (sent you the links).

have sent you some links of job run in us-west-2 (that's where I set up AWS Batch initially). Let me know if I should repeat the same in prod

neelasha23 commented 2 years ago

Have added seaborn dependency for the plot task in multiple_requirements_project in test/assets. While testing it i found a bug. The docker build command was installing dependencies from the env folder whereas the correct task specific file was present in the /project folder. So i have created a Dockerfile specific to aws-batch : https://github.com/ploomber/soopervisor/blob/789f6087a8ec375c79ea97bc8037d8c38636513a/src/soopervisor/assets/aws-batch/Dockerfile and changed the sequence of instalment command to be after RUN tar --strip-components=1 -zxvf *.tar.gz.

idomic commented 2 years ago

@neelasha23 Is it now batch-specific? Since it wasn't the initial intention.

neelasha23 commented 2 years ago

No it's not aws batch specific. Since aws batch is the priority right now, I have only modified the Dockerfile for the same. To be on the safe side I haven't modified the common Dockerfile. I did test with Argo and it seemed to be fine, but still once all the exporters are properly tested with this Dockerfile change maybe we can have just one Dockerfile.

idomic commented 2 years ago

Got it! Let's finish testing on the rest of the backs, generalize this Dockerfile and merge

neelasha23 commented 2 years ago

So as of now we just have changes for aws-batch , as per discussion with Eduardo here https://github.com/ploomber/soopervisor/issues/77#issuecomment-1147681581

Changes are pending for Argo, Kubeflow and Airflow. As of now they just take take the default image and run as earlier. Should I make changes for these exporters also?

idomic commented 2 years ago

@neelasha23 how much time do you think it'll take you including the rest of them? @edublancas thoughts on including the rest of the backends in? If it's ~2-3 days extra dev work I think we should add it, there's no pressing deadline currently.

neelasha23 commented 2 years ago

I had initially started with Airflow and Kubeflow. And i remember the environment was breaking. So would to figure those out first and then check the code changes. All the exporters have different logic for the export.

idomic commented 2 years ago

@neelasha23 I talked to Eduardo, let's keep it separate to avoid big PRs (this is already pretty huge).

neelasha23 commented 2 years ago

Sure. Any other reviews for this PR?

idomic commented 2 years ago

No, lgtm - nice job!!