hyphacoop / api.distributed.press

https://distributed.press
GNU Affero General Public License v3.0
77 stars 8 forks source link

wget2 errors on site cloning #91

Open fauno opened 4 days ago

fauno commented 4 days ago

i'm getting a 500 error while cloning https://sutty.nl even though running the same command locally with wget2 2.1.0 works correctly

{
  "statusCode": 500,
  "code": "8",
  "error": "Internal Server Error",
  "message": "Command failed: wget2   --random-wait   --compression=identity,gzip,br   --user-agent=\"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0\"   --mirror   --page-requisites   --convert-links   --adjust-extension   --continue   --no-host-directories   --directory-prefix=sutty.nl   \"https://sutty.nl\"\nMissing host/domain in URI 'https:'\nCannot resolve URI 'https:'\ntoASCII(�y ni hablar del dinero que ganan <a href=\"https) failed (-200): string encoding error\ntoASCII(tml\">consejos piratas para la apostasía de redes sociales<) failed (-203): punycode encoded data will be too large\ntoASCII(go de convivencia está basado en los “<a href=\"https) failed (-203): punycode encoded data will be too large\ntoASCII(ión es almacenada por sólo un par de proveedores de servicios, o servidores,<a href=\"https) failed (-203): punycode encoded data will be too large\ntoASCII(da personal, de nuestra presencia online o de nuestros ingresos económicos a <a href=\"https) failed (-203): punycode encoded data will be too large\ntoASCII(s en orden de prioridad una lista de ideas que teníamos y otras que aportaron les participantes.<) failed (-203): punycode encoded data will be too large\ntoASCII(as, stickers en la compu y de fondo, la exposición del cc tierra violeta\" ) failed (-203): punycode encoded data will be too large\ntoASCII(amos un sitio específico que actúa como intermediario entre la página que queremos compartir y la instancia que aloja nuestre usuarie.<) failed (-203): punycode encoded data will be too large\ntoASCII(� que al final seguimos los <a href=\"https) failed (-200): string encoding error\n"
}
>_ wget2   --random-wait   --compression=identity,gzip,br   --user-agent="Mozilla/5.0 (Windows NT 
10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0"   --mirror   --page-requisites   --convert-links   --adjust-extension   --cont
inue   --no-host-directories   --directory-prefix=sutty.nl   "https://sutty.nl"
31 files             100% [=====================================================================================>]    6.82M    1.28MB/s
70 files             100% [=====================================================================================>]   10.02M    2.15MB/s
34 files             100% [=====================================================================================>]   55.64M    5.40MB/s
36 files             100% [=====================================================================================>]   13.60M    2.60MB/s
73 files             100% [=====================================================================================>]    8.24M    1.18MB/s
                          [Files: 244  Bytes: 94.34M [3.68MB/s] Redirects: 2  Todo: 0  Errors: 19                ]
RangerMauve commented 1 day ago

Super weird. Unit tests for the clone API are passing

RangerMauve commented 1 day ago

Interesting, it seems to be an issue with one of the URLs in the sutty site being too long when combined with the rest of the FS

RangerMauve commented 1 day ago

Try running the wget2 command in a deeply nested directory. ON the DP server it's running in /home/press/.local/share/distributed-press-nodejs/sites/sites

RangerMauve commented 1 day ago

I think this has to do with the filesystem having a limit on file names. https://superuser.com/a/790264

RangerMauve commented 1 day ago

@fauno try now, I pushed a potential fix by setting the local-encoding to UTF-8