microsoft / azuredatastudio

Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases. Azure Data Studio supports Windows, macOS, and Linux, with immediate capability to connect to Azure SQL and SQL Server. Browse the extension library for more database support options including MySQL, PostgreSQL, and MongoDB.
https://learn.microsoft.com/sql/azure-data-studio
MIT License
7.56k stars 899 forks source link

Not able to install / configure the Notebook dependencies (timeout accessing pypi.org #14874

Closed lcanastreiro closed 2 years ago

lcanastreiro commented 3 years ago

Steps to Reproduce:

I’m having problems setting up notebook dependencies extension when I'm changing the Kernel of a notebook from the Default SQL to Pyspark:

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x00000254CD381470>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pip/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x00000254CD381668>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pip/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x00000254CD3815F8>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pip/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x00000254CD3813C8>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pip/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x00000254CD381400>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pip/ ERROR: Could not find a version that satisfies the requirement pip==19.2.3 (from versions: none) ERROR: No matching distribution found for pip==19.2.3

I don't have internet access to https://pypi.org/. Is there a way to install all dependencies in an offline way?

Best Regards, Luís.

VasuBhog commented 3 years ago

@corivera I know you were working in this area. Is there a way to get the dependencies offline?

corivera commented 3 years ago

You can download package dependencies to a local directory using the following command:

python -m pip download jupyter sparkmagic -d .\WheelDir

Then once you have that package directory, you can install the packages from it using this command:

python -m pip install --no-index jupyter sparkmagic --find-links .\WheelDir

lcanastreiro commented 3 years ago

Thanks Cory and Vasu for your replies. But in this case, the process isn't yet on the process of installing the packages. It's blocking firstly, in terms of pip and the specific version (19.2.3) that ADS is trying to install.

Another aspect, do you think that would be possible to document which (URI/URLs) should be whitelisted, at least for the default set of extensions and required packages to be fully installed?

corivera commented 3 years ago

You'd have to whitelist whatever endpoints pip requires, since that's what we're using to install the packages. There's the main PyPI URL, https://pypi.org, as well as the site that hosts the packages, https://files.pythonhosted.org. There may be others, but those are the only ones I'm aware of.

At what point are you getting this install error? Was it after going through the python install dialog, or just from clicking PySpark in the dropdown?

lcanastreiro commented 3 years ago

Thanks Cory.

This is happening, when I try to create a new notebook, then I select PySpark Kernel and then there's a form to install the Python Runtime, I'm selecting the default Option (New Python Installation) and then the download of Python happens, it says that Python is installed and then it fails when the process tries to install pip.

Thanks, Luís.

corivera commented 3 years ago

Sounds like it's a pip reachability issue, then. We pip install required packages after downloading our provided python package for the New Install option. Our python package is hosted on the Microsoft Download Center, so maybe that's already allowed on your machine while pypi.org is not.