windmill-labs / windmill

Open-source developer platform to turn scripts into workflows and UIs. Fastest workflow engine (5x vs Airflow). Open-source alternative to Airplane and Retool.
https://windmill.dev
Other
8.84k stars 384 forks source link

bug: [python] subdeps breaking wmill import #3596

Open erickvneri opened 1 month ago

erickvneri commented 1 month ago

Describe the bug

Note: I raise this as an issue as I don't have enough Rust expertise to fix it and submit it

Resources

  1. Pinning dependencies and Requirements

Summary

Yesterday we submitted a PR to fix an import parse opting to prioritize haystack imports to load haystack-ai instead haystack. The release worked perfectly and allowed us to test a few flows.

However, there are a few sub dependencies on the Haystack's modules that require other deps. We followed the official document referred above and it seems we broke our wmill pip installation.

This was reproduced several times on a localhost environment.

To reproduce

  1. Create a flow and an Inline python action
  2. Copy/Paste the following code into the action
    
    # requirements:
    # dependency
    # transformers==4.40.0
    # accelerate==0.29.3
    # torch==2.2.2
    import wmill
    import haystack

def main(): pass

3. At some point, you should be able to see the following error message:

ExecutionErr: ExitCode: 1, last log lines: Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/tmp/windmill/wk-default-77e24f9bbede-VtJBM/018f0c3d-6288-a8e3-0320-d3952c60bfbe/wrapper.py", line 9, in from u.talentgenius.job_recommendations.branchone_2 import step_0 as inner_script File "/tmp/windmill/wk-default-77e24f9bbede-VtJBM/018f0c3d-6288-a8e3-0320-d3952c60bfbe/u/talentgenius/job_recommendations/branchone_2/step_0.py", line 7, in import wmill ModuleNotFoundError: No module named 'wmill'



### Expected behavior

ExecutionErr: ExitCode: 1, last log lines:
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/tmp/windmill/wk-default-77e24f9bbede-VtJBM/018f0c3d-6288-a8e3-0320-d3952c60bfbe/wrapper.py", line 9, in <module>
    from u.talentgenius.job_recommendations.branchone_2 import step_0 as inner_script
  File "/tmp/windmill/wk-default-77e24f9bbede-VtJBM/018f0c3d-6288-a8e3-0320-d3952c60bfbe/u/talentgenius/job_recommendations/branchone_2/step_0.py", line 7, in <module>
    import wmill
ModuleNotFoundError: No module named 'wmill'

### Screenshots

![image](https://github.com/windmill-labs/windmill/assets/54951763/6425172f-f543-4ded-a465-786a578a3fe8)

### Browser information

Versión 1.65.114 Chromium: 124.0.6367.60 (Build oficial) (64 bits)

### Application version

1.313.0 (2024-04-23)

### Additional Context

It seems `wmill` is broken per action, i.e. other actions with different import configurations won't get affected and will be able to use the `wmill` module properly
rubenfiszel commented 1 month ago

you're simply missing wmill in your list of requirements

erickvneri commented 1 month ago

I'll give you an update ASAP. I attempted importing wmill on different order but somehow I was missing either haystack or wmill again... it was a weird loop that in some cases were not even finishing execution/installation

erickvneri commented 1 month ago

Hi, @rubenfiszel

Here are some steps that you can follow to replicate the loop:

version: CE v1.317.1-7-g094f50cdc

This works great and explicit requirements for wmill installation wasn't necessary. Also, haystack-ai was properly installed.

    import wmill
    from haystack import Document

    hugging_face_token = wmill.get_variable("u/user/HUGGING_FACE_TOKEN")

    def main():
        pass

Errors will appear begin when I try to access haystack's submodules:

  1. This one will raise as I need additional deps:

    import wmill
    from haystack import Document
    from haystack.components.rankers import TransformersSimilarityRanker
    
    hugging_face_token = wmill.get_variable("u/user/HUGGING_FACE_TOKEN")
    
    def main():
        docs = [Document(content="Paris"), Document(content="Berlin")]
        ranker = TransformersSimilarityRanker()
        ranker.warm_up()
        ranker.run(query="City in France", documents=docs, top_k=1)
    
    {
        "error": {
            "name": "ImportError",
            "stack": "  File \"/tmp/windmill/wk-default-77e24f9bbede-vupvl/018f16cd-a026-3255-0d79-e81aa130865f/u/user/haystack_transformers_similarity_ranker.py\", line 11, in main\n    ranker = TransformersSimilarityRanker()\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/tmp/windmill/cache/pip/haystack-ai==2.0.1/haystack/core/component/component.py\", line 132, in __call__\n    instance = super().__call__(*args, **kwargs)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/tmp/windmill/cache/pip/haystack-ai==2.0.1/haystack/components/rankers/transformers_similarity.py\", line 92, in __init__\n    torch_and_transformers_import.check()\n\n  File \"/tmp/windmill/cache/pip/lazy-imports==0.3.1/lazy_imports/try_import.py\", line 107, in check\n    raise ImportError(message) from exc_value\n",
            "message": "Failed to import 'accelerate'. Run 'pip install transformers[torch,sentencepiece]'. Original error: No module named 'accelerate'"
        }
    }
  2. Add subdependencies (from this format transformers[torch,sentencepiece] to the following three libraries as it raises the same error):

    # requirements:
    # transformers==4.40.0
    # accelerate==0.29.3
    # torch==2.2.2
    import wmill
    from haystack import Document
    from haystack.components.rankers import TransformersSimilarityRanker
    
    hugging_face_token = wmill.get_variable("u/user/HUGGING_FACE_TOKEN")
    
    def main():
        docs = [Document(content="Paris"), Document(content="Berlin")]
        ranker = TransformersSimilarityRanker()
        ranker.warm_up()
        ranker.run(query="City in France", documents=docs, top_k=1)
    
    job 018f16bd-bf74-496f-f78e-952f5f937a26 on worker wk-default-77e24f9bbede-vupvl (tag: python3)
    
    --- PYTHON CODE EXECUTION ---
    
    Traceback (most recent call last):
    File "<frozen runpy>", line 198, in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/tmp/windmill/wk-default-77e24f9bbede-vupvl/018f16bd-bf74-496f-f78e-952f5f937a26/wrapper.py", line 9, in <module>
        from u.user import haystack_transformers_similarity_ranker as inner_script
    File "/tmp/windmill/wk-default-77e24f9bbede-vupvl/018f16bd-bf74-496f-f78e-952f5f937a26/u/user/haystack_transformers_similarity_ranker.py", line 6, in <module>
        import wmill
    ModuleNotFoundError: No module named 'wmill'
  3. Code Based on that I'll try to install wmill explicitly:

    # requirements:
    # transformers==4.40.0
    # accelerate==0.29.3
    # torch==2.2.2
    # wmill
    import wmill
    from haystack import Document
    from haystack.components.rankers import TransformersSimilarityRanker
    
    hugging_face_token = wmill.get_variable("u/user/HUGGING_FACE_TOKEN")
    
    def main():
        docs = [Document(content="Paris"), Document(content="Berlin")]
        ranker = TransformersSimilarityRanker()
        ranker.warm_up()
        ranker.run(query="City in France", documents=docs, top_k=1)
    
    job 018f16d1-52fd-584d-033e-008f5ba8b0ea on worker wk-default-8d69dd904b5b-4vDkL (tag: python3)
    
    --- PYTHON CODE EXECUTION ---
    
    Traceback (most recent call last):
    File "<frozen runpy>", line 198, in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/tmp/windmill/wk-default-8d69dd904b5b-4vDkL/018f16d1-52fd-584d-033e-008f5ba8b0ea/wrapper.py", line 9, in <module>
        from u.user import haystack_transformers_similarity_ranker as inner_script
    File "/tmp/windmill/wk-default-8d69dd904b5b-4vDkL/018f16d1-52fd-584d-033e-008f5ba8b0ea/u/user/haystack_transformers_similarity_ranker.py", line 7, in <module>
        from haystack import Document
    ModuleNotFoundError: No module named 'haystack'
  4. At this point, windmill requires me to re-refer to haystack-ai

    # requirements:
    # transformers==4.40.0
    # accelerate==0.29.3
    # torch==2.2.2
    # wmill
    # haystack-ai
    import wmill
    from haystack import Document
    from haystack.components.rankers import TransformersSimilarityRanker
    
    hugging_face_token = wmill.get_variable("u/user/HUGGING_FACE_TOKEN")
    
    def main():
        docs = [Document(content="Paris"), Document(content="Berlin")]
        ranker = TransformersSimilarityRanker()
        ranker.warm_up()
        ranker.run(query="City in France", documents=docs, top_k=1)

It is at this point where windmill loops forever and this is the info I can get from the browser:

  1. Preview Request:

    POST http://localhost/api/w/localhost/jobs/run/preview -> 200
    
    Response
    {"path":"u/user/haystack_transformers_similarity_ranker","content":"# requirements:\n# transformers==4.40.0\n# accelerate==0.29.3\n# torch==2.2.2\n# wmill\n# haystack-ai\nimport wmill\nfrom haystack import Document\nfrom haystack.components.rankers import TransformersSimilarityRanker\n\nhugging_face_token = wmill.get_variable(\"u/user/HUGGING_FACE_TOKEN\")\n\n\ndef main():\n    docs = [Document(content=\"Paris\"), Document(content=\"Berlin\")]\n    ranker = TransformersSimilarityRanker()\n    ranker.warm_up()\n    ranker.run(query=\"City in France\", documents=docs, top_k=1)\n","args":{},"language":"python3","tag":null}
  2. Get Update Request:

    GET http://localhost/api/w/localhost/jobs_u/getupdate/018f16d8-5d88-8c03-f7cd-261453042728?running=true&log_offset=220 -> 200
    
    Response
    {
        "running": null,
        "completed": null,
        "new_logs": "",
        "log_offset": 220,
        "mem_peak": 5716,
        "flow_status": null
    }
  3. Cancel Request:

    POST http://localhost/api/w/localhost/jobs_u/queue/cancel/018f16d8-5d88-8c03-f7cd-261453042728 -> 200

This is the deepest I've reached into debugging the installation.

Let me know if you need extra info to replicate the issue.

rubenfiszel commented 1 month ago

On your example 4, I do not think it''s looping forever, it's just resolving all the dependencies which can take up to a few minutes depending on your network and cpu. It works for me and it works on app.windmill.dev minus the fact it exceeds space storage.