microsoft / linux-package-repositories

Microsoft Packaged Linux Software (DEBs, RPMs, etc) are hosted on packages.microsoft.com (PMC) made available as native Linux repositories for use with package managers like APT, YUM, etc.
https://packages.microsoft.com
MIT License
64 stars 17 forks source link

Fail when installing mssql-tools18_18.2.1.1-1_amd64.deb #127

Closed hannesrd closed 5 months ago

hannesrd commented 5 months ago

Describe the issue We are trying to install mssql-tools18 from different locations. Install in an docker build fails.

When did the issue occur?

Using Kubernetes Gitlab Runner.

If applicable, what package did you attempt to install, and from which repo? mssql-tools18

deb [arch=amd64,armhf,arm64] https://packages.microsoft.com/debian/11/prod bullseye main

Steps to Reproduce

DockerFile containing

RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - RUN curl https://packages.microsoft.com/config/debian/11/prod.list | tee /etc/apt/sources.list.d/msprod.list RUN apt update RUN ACCEPT_EULA=Y apt -y install mssql-tools18 RUN ACCEPT_EULA=Y apt -y install unixodbc-dev RUN ACCEPT_EULA=Y apt -y install msodbcsql18

for RUN ACCEPT_EULA=Y apt -y install mssql-tools18 we get something like

[ 9/13] RUN ACCEPT_EULA=Y apt -y install mssql-tools18: 31.19 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. 31.19 Need to get 222 kB of archives. 31.19 After this operation, 0 B of additional disk space will be used. 31.19 Ign:1 https://packages.microsoft.com/debian/11/prod bullseye/main amd64 mssql-tools18 amd64 18.2.1.1-1 62.24 Ign:1 https://packages.microsoft.com/debian/11/prod bullseye/main amd64 mssql-tools18 amd64 18.2.1.1-1 94.31 Ign:1 https://packages.microsoft.com/debian/11/prod bullseye/main amd64 mssql-tools18 amd64 18.2.1.1-1 128.4 Err:1 https://packages.microsoft.com/debian/11/prod bullseye/main amd64 mssql-tools18 amd64 18.2.1.1-1 128.4 Could not wait for server fd - select (11: Resource temporarily unavailable) [IP: 137.117.241.158 443] 128.4 E: Failed to fetch https://pmc-geofence.trafficmanager.net/debian/11/prod/pool/main/m/mssql-tools18/mssql-tools18_18.2.1.1-1_amd64.deb?geofence=true Could not wait for server fd - select (11: Resource temporarily unavailable) [IP: 137.117.241.158 443] 128.4 E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

for RUN wget https://pmc-geofence.trafficmanager.net/ we get timeouts like

0.040 --2024-04-03 13:43:18-- https://pmc-geofence.trafficmanager.net/ 0.049 Resolving pmc-geofence.trafficmanager.net (pmc-geofence.trafficmanager.net)... 137.117.241.158 0.096 Connecting to pmc-geofence.trafficmanager.net (pmc-geofence.trafficmanager.net)|137.117.241.158|:443... connected.

Actual Result

Fail like above

unixodbc-dev and msodbcsql18 are installed

Expected Result

Installation of Package ssql-tools18

When I use another system I get a different IP for pmc-geofence.trafficmanager.net and everything works.

Screenshots

Additional context

daviddavis commented 5 months ago

Is this issue intermittent or can you not connect to pmc-geofence.trafficmanager.net at all? Can you maybe paste the output of curl -v https://pmc-geofence.trafficmanager.net/?

mbearup commented 5 months ago

@hannesrd note that mssql packages are delivered via geofence (which is required for legal/tax reasons). If you have network restrictions in-place, they may block your access to the geofence infrastructure. The curl -v output would likely clarify that.

hannesrd commented 5 months ago

[ 4/21] RUN curl -v https://pmc-geofence.trafficmanager.net/ 0.073 % Total % Received % Xferd Average Speed Time Time Time Current 0.073 Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 Trying 137.117.241.158:443... 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 Connected to pmc-geofence.trafficmanager.net (137.117.241.158) port 443 (#0) 0.135 ALPN: offers h2,http/1.1 0.135 } [5 bytes data] 0.135 TLSv1.3 (OUT), TLS handshake, Client hello (1): 0.135 } [512 bytes data] 0.161 CAfile: /etc/ssl/certs/ca-certificates.crt 0.161 CApath: /etc/ssl/certs 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0

.....

0 0 0 0 0 0 0 0 --:--:-- 0:02:00 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:01 --:--:-- 0

  • Recv failure: Connection reset by peer 122.1 OpenSSL SSL_connect: Connection reset by peer in connection to pmc-geofence.trafficmanager.net:443 0 0 0 0 0 0 0 0 --:--:-- 0:02:02 --:--:-- 0 122.1 Closing connection 0 122.1 curl: (35) Recv failure: Connection reset by peer ERROR: process "/bin/sh -c curl -v https://pmc-geofence.trafficmanager.net/" did not complete successfully: exit code: 35

I don't get why it is possible from other locations in our network.

We are using gitlab-runner in kubernetes.

hannesrd commented 5 months ago

@daviddavis @mbearup I forgot to tag you. Informations are pasted in last comment.

mbearup commented 5 months ago

@hannesrd I'm at a bit of a loss here.

Index of /

Index of /


alma/
```
- HTTPS works too (have to use /etc/hosts trick to satisfy TLS)
```
$ ping pmc-geofence.trafficmanager.net
PING pmc-geofence.trafficmanager.net (137.117.241.158) 56(84) bytes of data.
64 bytes from pmc-geofence.trafficmanager.net (137.117.241.158): icmp_seq=1 ttl=101 time=152 ms
...
$ curl https://pmc-geofence.trafficmanager.net/


Index of /

Index of /


...
```
- We have a WAF enabled, but it would emit known error codes (403 or 429) if it was blocking your traffic. To my knowledge, there's no scenario where the WAF would drop a connection in this manner.
- Since this is a connection failure, it's unlikely to appear in our access or WAF logs. You could try requesting a unique URL (i.e. /foo), which would help us find your request in the logs, but if the connection isn't successfully established, I suspect nothing will be logged.
- You could try hitting this endpoint via HTTP (i.e. `curl http://pmc-geofence.trafficmanager.net/`). It's possible the failure is centered around the TLS handshake, so using HTTP may reveal more information.
- You could try targeting a different App Gateway (i.e. `curl -v -H "Host: pmc-geofence.trafficmanager.net" http://168.63.54.159/`). I suspect this will fail the same way, but could rule out any issues with a specific AppGateway.            
hannesrd commented 5 months ago

@mbearup thanks for the hints!

I ran today curl -v -H "Host: pmc-geofence.trafficmanager.net" http://168.63.54.159/test1-linux-package-repositories-issues-127 test1 from the Kubernetes-Worker-Node -> OK

curl -v -H "Host: pmc-geofence.trafficmanager.net" http://168.63.54.159/test2-linux-package-repositories-issues-127 test2 from the Gitlab Build Container -> failed , it's Docker in Docker, I don't get why this URL failed. Are there any Issues from other Cloud oder Docker-Users? Like:

https://developercommunity.visualstudio.com/t/Packages-for-mssql-tools-report-403/10613043?sort=newest
https://github.com/microsoft/linux-package-repositories/issues/119
https://github.com/microsoft/msphpsql/issues/1505

So I guess we seem to have a problem with this container-configuration.

curl to other websites like github works

http

curl http://pmc-geofence.trafficmanager.net/

[ 4/22] RUN curl http://pmc-geofence.trafficmanager.net/

0.073 % Total % Received % Xferd Average Speed Time Time Time Current 0.073 Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0 ...

0 0 0 0 0 0 0 0 --:--:-- 0:02:11 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:12 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:13 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:14 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:15 --:--:-- 0 135.2 curl: (56) Recv failure: Connection reset by peer ERROR: process "/bin/sh -c curl http://pmc-geofence.trafficmanager.net/" did not complete successfully: exit code: 56

importing cache manifest from registry-gitlab.relaxdays.de/team-devops/gtiops/ci-cd-workshop/test-apt:c93305f8f2736caad936ea3b6ca5023b82c84b92:

other host curl -v -H "Host: pmc-geofence.trafficmanager.net" http://168.63.54.159/

[ 4/22] RUN curl -v -H "Host: pmc-geofence.trafficmanager.net" http://168.63.54.159/ 0.075 % Total % Received % Xferd Average Speed Time Time Time Current 0.075 Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 Trying 168.63.54.159:80... 0.107 Connected to 168.63.54.159 (168.63.54.159) port 80 (#0) 0.107 > GET / HTTP/1.1 0.107 > Host: pmc-geofence.trafficmanager.net 0.107 > User-Agent: curl/7.88.1 0.107 > Accept: / 0.107 > 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0 ... 0 0 0 0 0 0 0 0 --:--:-- 0:02:13 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:14 --:--:-- 0

  • Recv failure: Connection reset by peer 0 0 0 0 0 0 0 0 --:--:-- 0:02:15 --:--:-- 0 135.3 * Closing connection 0 135.3 curl: (56) Recv failure: Connection reset by peer ERROR: process "/bin/sh -c curl -v -H \"Host: pmc-geofence.trafficmanager.net\" http://168.63.54.159/" did not complete successfully: exit code: 56

    importing cache manifest from registry-gitlab.relaxdays.de/team-devops/gtiops/ci-cd-workshop/test-apt:d560b6d5f697025c6936ecd4b7a4f4d6c3ce132f:

hannesrd commented 5 months ago

@mbearup @daviddavis We made a tcp-dump on the hosting node and are curios about MTU-Warnings. Log attached tcpdump.txt

mbearup commented 5 months ago

@hannesrd regarding the other issues you linked above, all of those are HTTP 403 errors. As I mentioned earlier, that's a symptom of the WAF blocking requests. Since you're getting TCP connection failure, it's a different symptom.

Which is a relevant point ... for your test requests, surprisingly I see both in our logs. However, we did emit 403s for these urls (test1 and test2). This is because we have a rule which rejects unknown top-level folders (to filter out garbage requests i.e. /admin.php). So perhaps this was a poor test case.

We could try again with a different test url (i.e. /ubuntu/test1). However, this does provide useful information: your client seems to have experienced a connection failure, but the App Gateway did receive and respond to the request. And both the test1 and test2 requests came from the same IP, so I think we can rule out other requestors. test2

Looking at requests for mssql-tools18_18.2.1.1-1_amd64.deb from the same IP, I see two... One was at 2024-04-09T07:43:10Z and successful (200) The other was at 2024-04-09T07:42:46Z and was aborted prematurely by the client (ERRORINFO_CLIENT_CLOSED_REQUEST)

Also attaching the output of my test/tcpdump. My VM is set for MTU 1500 (which is fairly standard, in Azure and elsewhere) and I received no MTU fragmentation messages. tcpdump-1500.txt

mbearup commented 5 months ago

@hannesrd I can only conclude that this is a local networking issue (perhaps related to docker-in-docker), Per the above, we support an MTU of 1500, and our service is receiving your requests. Something else seems to be interfering with the connection. Apologies that we can't provide more clarity here.

hannesrd commented 4 months ago

@mbearup thanks for the investigation on your side! This helped us very much. I see the issue is closed, but let me write down our results for who may find them:

MTU was the issue. The docker-in-docker container had 1500 while the outer container had 1450. I'm not sure why this was only a problem with this endpoint. I found it out bei setting "RUN ifconfig" in my Dockerfile. You may also check for different payloads like

RUN ping -M do -s 1401 -c 1 8.8.8.8

I tried several fixes by configuring the MTU in den gitlab-runner toml. Didn't change anything.

The solution was by setting it in the gitlab-ci.yaml.

build: services:

  • name: docker:20.10.12-dind command:
    • "--mtu=1450"

Source Bug: https://www.civo.com/learn/fixing-networking-for-docker Source Solution: https://gitlab.com/gitlab-org/gitlab/-/issues/27716#note_628181430

So: Fixed for me. We will include this in our template.