tcort / markdown-link-check

checks all of the hyperlinks in a markdown text to determine if they are alive or dead
ISC License
575 stars 116 forks source link

markdown-link-check hang and can't timeout or exit with error #336

Open wuwentao opened 2 months ago

wuwentao commented 2 months ago

I'm not sure with this error debug , there is no any error or timeout or exit, it will hang more than 10-20 minutes, not every md file can't work, hang log as below:


/workspace # markdown-link-check -p -v en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md

FILE: en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md
Checking... [=                        ] 4%^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C
/workspace # markdown-link-check -v en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md

FILE: en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md
^C
/workspace # vi en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md
/workspace #
/workspace #
/workspace # hostname
68f9aa12c26f
/workspace # uname -a
Linux 68f9aa12c26f 5.4.0-170-generic #188-Ubuntu SMP Wed Jan 10 09:51:01 UTC 2024 x86_64 Linux
/workspace # cat /etc/issue
Welcome to Alpine Linux 3.20
Kernel \r on an \m (\l)

/workspace # /usr/local/bin/markdown-link-check --version
3.12.2
/workspace # /usr/local/bin/markdown-link-check  -v -p en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md

FILE: en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md
Checking... [=                        ] 4%^@

any suggestion or debug info to check what's wrong with it ? not every document in this mode, it also can works in some md file as below:

this md file should works well as below, after I use CTRL+C to force quit the error md file and rerun with another md file:

/workspace # uname -a
Linux 68f9aa12c26f 5.4.0-170-generic #188-Ubuntu SMP Wed Jan 10 09:51:01 UTC 2024 x86_64 Linux
/workspace # cat /etc/issue
Welcome to Alpine Linux 3.20
Kernel \r on an \m (\l)

/workspace # /usr/local/bin/markdown-link-check --version
3.12.2
/workspace # /usr/local/bin/markdown-link-check  -v -p en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md

FILE: en/01_software/pc/ISP_tuning/K230_ISP_Image_Tuning_Guide.md
Checking... [=                        ] 4%^@ ^@^C
/workspace # /usr/local/bin/markdown-link-check  -v -p en/01_software/pc/ISP_tuning/
K230_ISP_Image_Tuning_Guide.md     K230_ISP_Initial_Setting_Guide.md  images/
/workspace # /usr/local/bin/markdown-link-check  -v -p en/01_software/pc/ISP_tuning/K230_ISP_Initial_Setting_Guide.md

FILE: en/01_software/pc/ISP_tuning/K230_ISP_Initial_Setting_Guide.md
Checking... [=========================] 100%
  [✓] ../../../../zh/01_software/pc/ISP_tuning/images/canaan-cover.png → Status: 200
  [✓] ../../../../zh/01_software/pc/ISP_tuning/images/logo.png → Status: 200
  [✓] ../../../../zh/01_software/pc/ISP_tuning/images/xml_04.png → Status: 200
  [✓] ../../../../zh/01_software/pc/ISP_tuning/images/xml_01.png → Status: 200
  [✓] ../../../../zh/01_software/pc/ISP_tuning/images/xml_02.png → Status: 200
  [✓] ../../../../zh/01_software/pc/ISP_tuning/images/xml_03.png → Status: 200

  6 links checked.
/workspace #
wuwentao commented 2 months ago

found the root cause, this markdown have a 4+GB download link for [MATLAB_Runtime_R2023a_win64.zip] so markdown-link-check will download the full file and our network speed is too slow, it will required at least 30+ minutes to finish the download job, so it always hang and can't return or print any info before it download the zip file.

also manual checked with wget to directly download this file, the network should be ok, but there is a http 302 redirect, so wget can follow it, result as below:

/workspace # time wget https://ssd.mathworks.com/supportfiles/downloads/R2023a/Release/0/deployment_files/installer/complete/win64/MATLAB_Runtime_R2023a_win64.zip
Connecting to ssd.mathworks.com (104.80.240.138:443)
Connecting to ssd.mathworks.cn (218.58.101.205:443)
saving to 'MATLAB_Runtime_R2023a_win64.zip'
MATLAB_Runtime_R2023   0% |                                                                                                                                     | 29.0M  0:29:18 ETA^CCommand terminated by signal 2
real    0m 12.68s
user    0m 0.00s
sys 0m 0.06s
/workspace #

use CTRL+C to force quit ,terminated by signal 2

curl -vvv -X GET https://ssd.mathworks.com/supportfiles/downloads/R2023a/Release/0/deployment_files/installer/complete/win64/MATLAB_Runtime_R2023a_win64.zip
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 23.51.110.92:443...
* TCP_NODELAY set
* Connected to ssd.mathworks.com (23.51.110.92) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; ST=Massachusetts; L=Natick; O=The MathWorks Inc; CN=www.mathworks.com
*  start date: Sep  5 00:00:00 2023 GMT
*  expire date: Sep  5 23:59:59 2024 GMT
*  subjectAltName: host "ssd.mathworks.com" matched cert's "ssd.mathworks.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=GeoTrust RSA CA 2018
*  SSL certificate verify ok.
> GET /supportfiles/downloads/R2023a/Release/0/deployment_files/installer/complete/win64/MATLAB_Runtime_R2023a_win64.zip HTTP/1.1
> Host: ssd.mathworks.com
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Moved Temporarily
< Server: AkamaiGHost
< Content-Length: 0
< Location: https://ssd.mathworks.cn/supportfiles/downloads/R2023a/Release/0/deployment_files/installer/complete/win64/MATLAB_Runtime_R2023a_win64.zip
< Date: Thu, 13 Jun 2024 02:44:08 GMT
< Connection: keep-alive
< Access-Control-Allow-Origin: *
<
* Connection #0 to host ssd.mathworks.com left intact

so i add args with -a and -r , also can't works:

/usr/local/bin/markdown-link-check -a 301,302 -r -v -p xxx.md