worldcoin / open-iris

Open Iris Recognition Inference System (IRIS)
MIT License
238 stars 47 forks source link

Missing hf_transfer lib @ dev setup #17

Closed brgrp closed 5 months ago

brgrp commented 6 months ago

Missing package

PR description

I tried to run the pipeline in the conda dev setup, and noticed the hf_transfer package is missing.

wiktorlazarski commented 6 months ago

Hello @brgrp,

Thank you for your contribution.

I had a look at that and on my end installing and using open-iris in a development, conda environment works without any problems - none of packages are missing.

I'm able to download the model from HF repo and use it to make an e2e IRISPipeline inference call without getting any errors. I followed steps described in the Setup for development section of the README.md file.

Can you provide more information about machine that you are using when installing and working with open-iris package? Also, I'd appreciate if you could show some logs that confirm that indeed hf_transfer package is missing. All that I'll find useful to recreate that bug and confirm that your PR fixes it.

Thanks in advance, Wiktor

brgrp commented 6 months ago

Hi @wiktorlazarski,

Thank you for your swift response and thorough investigations into the issue.

I'm currently running Ubuntu 22.04 LTS with Conda version 23.7.4. Following the steps outlined in the readme file, I encountered the same issue repeatedly, even after attempting it again today.

Upon some more careful investigation, I believe I've managed to narrow down the problem (please refer to the attached screenshots for clarity).

It seems that the variable HF_HUB_ENABLE_HF_TRANSFER=1 is being set somewhere, but I'm unable to determine its origin. Interestingly, it's not set within my Zsh environment, nor is it defined as a Conda environment variable (please see the attached screenshots for confirmation), and I couldn't find it set somewhere in your code. Any suggestions appreciated.

However, when I manually set HF_HUB_ENABLE_HF_TRANSFER=0, everything functions as expected!?

Some screenshots for reference: Screenshot from 2024-04-10 11-05-05

Screenshot from 2024-04-10 11-07-06

Screenshot from 2024-04-10 11-07-39

Feel free to close the pull requests, if the HF_TRANSFER is not set on your end and should not be used for downloading. If I find some more time, I will check whether its a known issue @ huggingface-cli or conda.

Thanks

wiktorlazarski commented 6 months ago

Hey Patrick,

thank you for detailed explanations. I'm really sorry for my late response.

I read your reply and I think that we should tackle that issue. There are two ways I'd fix it. The first one is to introduce hf_transfer as a dependency, as you did, but I'd put that in the requirements/base.txt file. This will mean that hf_transfer will always be installed no matter if someone wants to use open-iris on the Orb, on local machine or to develop it. If we go for that solution let's pin a specific version of hf_transfer that works with Python's version 3.8, 3.9, 3.10 (GA CI will check that for us).

The second solution, I'm seeing here is to set the HF_HUB_ENABLE_HF_TRANSFER=0 when installing open-iris or just before downloading semantic segmentation model with os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = 0 line.

Let me know what you think. Personally, I'd prefer the second solution because this one doesn't introduce a new dependency into our stack.

Best regards, Wiktor

brgrp commented 6 months ago

Hi @wiktorlazarski,

Thank you for your response. Both options seem viable, though they may depend on individual preferences.

Upon reviewing the Hugging Face manual, I found the following guidance:

hf_transfer is a Rust-based package developed to maximize the bandwidth used by dividing large files into smaller parts and transferring them simultaneously using multiple threads. This approach can potentially double the transfer speed. To use hf_transfer:

To integrate hf_transfer: Install the huggingface_hub package with the hf_transfer extra by executing: _pip install huggingface_hub[hftransfer]. Enable hf_transfer functionality by setting the environment variable HF_HUB_ENABLE_HF_TRANSFER=1.

While it is tested and production-ready, it may lack some user-friendly features like advanced error handling or proxies. Further details can be found in the documentation section here: hf_transfer documentation.

To address the additional dependency, one approach could be to install the pip package huggingface_hub[hf_transfer] instead my initial proposal and activate it using the specified environment variable.

Additionally, regardless of the chosen method, ensure that HF_HUB_ENABLE_HF_TRANSFER=1/0 is set preferably within the conda environment to avoid overriding global user settings.

Given the fact, that hf_transfer was optimized for large transfers I would suggest to go with solution two for now. But I know you guys like rust so solution one could be interesting too :-)

Best regards, Patrick

wiktorlazarski commented 5 months ago

@brgrp, Thank you for detailed investigation and all the explanations. I think we both agree to disable the hf_transfer by setting presetting HF_HUB_ENABLE_HF_TRANSFER=0 env variable.

Quick question from my site: would you like me to take over that part or do you have a bandwidth to implement and test that change?

brgrp commented 5 months ago

I would love too, but have other things on my list atm.

wiktorlazarski commented 5 months ago

I'm closing this PR since I cannot checkout to this branch (it comes from the forked version of the repo). A new PR that introduces discussed modifications will be opened.

Edit: Changed in this PR https://github.com/worldcoin/open-iris/pull/20