openzim / sotoki

StackExchange websites to ZIM scraper
https://library.kiwix.org/?category=stack_exchange
GNU General Public License v3.0
216 stars 25 forks source link

Issue setting up a new developer machine #288

Closed natamox closed 3 months ago

natamox commented 12 months ago

287

natamox commented 11 months ago

Hi, I seem to have a new trouble now when I run

pip install -r requirements.txt

the log shows The libzim library cannot be found.How to solve it, thanks

(env) ➜  sotoki git:(main) pip install -r requirements.txt 
Collecting kiwixstorage<1.0,>=0.8.1 (from -r requirements.txt (line 1))
  Downloading kiwixstorage-0.8.3-py3-none-any.whl (36 kB)
Collecting pif<0.9,>=0.8.2 (from -r requirements.txt (line 2))
  Downloading pif-0.8.2-py2.py3-none-any.whl (19 kB)
Collecting zimscraperlib<3.0,>=2.1 (from -r requirements.txt (line 3))
  Downloading zimscraperlib-2.1.0-py3-none-any.whl (154 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.5/154.5 kB 2.0 MB/s eta 0:00:00
Collecting xml_to_dict<0.2,>=0.1.6 (from -r requirements.txt (line 4))
  Downloading xml_to_dict-0.1.6-py3-none-any.whl (3.6 kB)
Collecting cli-formatter<1.3,>=1.2.0 (from -r requirements.txt (line 5))
  Downloading cli_formatter-1.2.0-py3-none-any.whl (6.6 kB)
Collecting py7zr<0.21,>=0.20.4 (from -r requirements.txt (line 6))
  Obtaining dependency information for py7zr<0.21,>=0.20.4 from https://files.pythonhosted.org/packages/2c/da/155bb1f692c067b9213c9c7b8c19a012a65027399606d623a25dfb1d3af1/py7zr-0.20.6-py3-none-any.whl.metadata
  Downloading py7zr-0.20.6-py3-none-any.whl.metadata (16 kB)
Collecting python-slugify<9.0.0,>=8.0.1 (from -r requirements.txt (line 7))
  Downloading python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)
Collecting jinja2<3.2,>=3.1.0 (from -r requirements.txt (line 8))
  Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 6.0 MB/s eta 0:00:00
Collecting redis!=4.5.2,<5.0,>=4.5.1 (from -r requirements.txt (line 9))
  Obtaining dependency information for redis!=4.5.2,<5.0,>=4.5.1 from https://files.pythonhosted.org/packages/20/2e/409703d645363352a20c944f5d119bdae3eb3034051a53724a7c5fee12b8/redis-4.6.0-py3-none-any.whl.metadata
  Downloading redis-4.6.0-py3-none-any.whl.metadata (8.3 kB)
Collecting beautifulsoup4<5.0,>=4.9.3 (from -r requirements.txt (line 10))
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.0/143.0 kB 6.2 MB/s eta 0:00:00
Collecting lxml<4.10,>=4.9.1 (from -r requirements.txt (line 11))
  Downloading lxml-4.9.3.tar.gz (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 19.4 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting jinja2-pluralize<0.4,>=0.3.0 (from -r requirements.txt (line 12))
  Downloading jinja2_pluralize-0.3.0-py2.py3-none-any.whl (4.8 kB)
Collecting tld<0.14,>=0.13 (from -r requirements.txt (line 13))
  Downloading tld-0.13-py2.py3-none-any.whl (263 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 263.8/263.8 kB 29.4 MB/s eta 0:00:00
Collecting mistune<3.0.0,>=2.0.5 (from -r requirements.txt (line 14))
  Downloading mistune-2.0.5-py2.py3-none-any.whl (24 kB)
Collecting python-dateutil<2.9,>=2.8.2 (from -r requirements.txt (line 15))
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 kB 20.6 MB/s eta 0:00:00
Collecting psutil<6.0,>=5.9.4 (from -r requirements.txt (line 16))
  Downloading psutil-5.9.5-cp38-abi3-macosx_11_0_arm64.whl (246 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 246.1/246.1 kB 18.3 MB/s eta 0:00:00
Collecting python-snappy<1.0,>=0.6.0 (from -r requirements.txt (line 17))
  Downloading python_snappy-0.6.1-cp39-cp39-macosx_10_9_universal2.whl (73 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.5/73.5 kB 8.1 MB/s eta 0:00:00
Collecting bidict<0.23,>=0.22.1 (from -r requirements.txt (line 18))
  Downloading bidict-0.22.1-py3-none-any.whl (35 kB)
Collecting cchardet<2.2,>=2.1.7 (from -r requirements.txt (line 19))
  Downloading cchardet-2.1.7.tar.gz (653 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 653.6/653.6 kB 37.8 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting boto3<2,>=1.12.39 (from kiwixstorage<1.0,>=0.8.1->-r requirements.txt (line 1))
  Obtaining dependency information for boto3<2,>=1.12.39 from https://files.pythonhosted.org/packages/7c/f5/6b4302a6d34715d47f7d729e13cae7a2b896b7272561b1b1d4b714a0b76b/boto3-1.28.30-py3-none-any.whl.metadata
  Downloading boto3-1.28.30-py3-none-any.whl.metadata (6.7 kB)
Collecting requests<3.0,>=2.23 (from kiwixstorage<1.0,>=0.8.1->-r requirements.txt (line 1))
  Obtaining dependency information for requests<3.0,>=2.23 from https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl.metadata
  Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting aws-requests-auth<0.5,>=0.4.2 (from kiwixstorage<1.0,>=0.8.1->-r requirements.txt (line 1))
  Downloading aws_requests_auth-0.4.3-py2.py3-none-any.whl (6.8 kB)
Collecting argparse (from pif<0.9,>=0.8.2->-r requirements.txt (line 2))
  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Collecting iso-639==0.4.5 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Downloading iso-639-0.4.5.tar.gz (167 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.4/167.4 kB 14.1 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting colorthief==0.2.1 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Downloading colorthief-0.2.1-py2.py3-none-any.whl (6.1 kB)
Collecting python-resize-image<1.2,>=1.1.19 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Downloading python_resize_image-1.1.20-py2.py3-none-any.whl (8.4 kB)
Collecting Babel<3.0,>=2.9 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Downloading Babel-2.12.1-py3-none-any.whl (10.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.1/10.1 MB 65.8 MB/s eta 0:00:00
Collecting file-magic<0.5,>=0.4.0 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Downloading file_magic-0.4.1-py3-none-any.whl (6.3 kB)
Collecting libzim<3.0,>=2.1.0 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Downloading libzim-2.1.0.tar.gz (8.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.3/8.3 MB 15.8 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [2 lines of output]
      [!] The libzim library cannot be found.
      Please verify it is correctly installed and can be found.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.
natamox commented 11 months ago

I have encountered this problem on both x86 windows and M1 mac. I tried 3.9 and 3.11 for the python version. It doesn’t seem to have much to do with the python version.

kelson42 commented 11 months ago

System/architecture seem supported and download.kiwix.org is online... strange!

natamox commented 11 months ago

If I follow the instructions in the Developers section of the readme and run

python src/sotoki/dependencies.py

I then get an error

(env) ➜  sotoki git:(main) python src/sotoki/dependencies.py
Traceback (most recent call last):
  File "/Users/natamox/Desktop/OpenSource/sotoki/src/sotoki/dependencies.py", line 6, in <module>
    from zimscraperlib.download import stream_file
ModuleNotFoundError: No module named 'zimscraperlib'

Then I tried to download all dependencies according to the requirements.txt file, and then encountered the above error.

But when I directly use pip to download zimscraperlib, its version is zimscraperlib-3.1.1, which does not match the requirements.txt file, and then execute python src/sotoki/dependencies.py, the following log will appear, It seems to be working fine, but I'm not sure. . . . .

(env) ➜  sotoki git:(main) python src/sotoki/dependencies.py
/Users/natamox/Desktop/OpenSource/SS/sotoki/env/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Downloading https://cdn.sstatic.net/Shared/stacks.css?v=ca5319e49c63 into /Users/natamox/Desktop/OpenSource/SS/sotoki/src/sotoki/assets/static/css/stacks.css
Downloading https://cdn.jsdelivr.net/npm/mathjax@3.1.3/es5/tex-mml-chtml.js into /Users/natamox/Desktop/OpenSource/SS/sotoki/src/sotoki/assets/static/js/tex-mml-chtml.js
Downloading https://unpkg.com/@stackoverflow/stacks-icons@2.20.0/build/index.js into /Users/natamox/Desktop/OpenSource/SS/sotoki/src/sotoki/assets/static/js/stack-icons.js
Downloading https://momentjs.com/downloads/moment.min.js into /Users/natamox/Desktop/OpenSource/SS/sotoki/src/sotoki/assets/static/js/moment.min.js
kelson42 commented 11 months ago

@natamox Thank you for all the details, we will come back to you in the next days

benoit74 commented 11 months ago

This is unfortunately an issue linked to your local setup, and it will be very hard for us to help, I'm afraid we do not have enough resources to help all volunteers master Python / their machine. Sorry about that. I've nevertheless gathered some questions/recommendations below.

Are you using a virtualenv to avoid dependencies issues (quite common without it)?

FYI, this is what I've done to check everything is fine on my Mac (Intel) (you do not have to do this in the tmp folder, it is just for the test to ensure that you start from a fresh and clean situation):

cd /tmp
git clone git@github.com:openzim/sotoki.git
cd sotoki
python3.8 -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txt

Everything ran fine. I can for instance succesfully run (I did not let it finish, but it started to download deps):

python src/sotoki/dependencies.py

Please follow exactly same instructions and report.

FYI, to clean-up afterward you have to:

deactivate
cd ..
rm -rf sotoki
benoit74 commented 11 months ago

BTW, this scraper use for now Python 3.8, so please use this version to avoid any other strange behavior linked to Python version, we probably never tested the scraper with 3.9 and 3.11 for instance.

natamox commented 11 months ago

Hello, the following is the log after I followed your instructions and downgraded the python version to 3.8.17

➜  OpenSource git clone git@github.com:openzim/sotoki.git
正克隆到 'sotoki'...
remote: Enumerating objects: 4192, done.
remote: Counting objects: 100% (1168/1168), done.
remote: Compressing objects: 100% (302/302), done.
remote: Total 4192 (delta 808), reused 1093 (delta 753), pack-reused 3024
接收对象中: 100% (4192/4192), 2.61 MiB | 56.00 KiB/s, 完成.
处理 delta 中: 100% (2728/2728), 完成.
➜  OpenSource cd sotoki
➜  sotoki git:(main) python3.8 -m venv venv
➜  sotoki git:(main) ll
total 128
-rw-r--r--@ 1 natamox  staff   4.4K  8 20 23:36 CHANGELOG.md
-rw-r--r--@ 1 natamox  staff   2.4K  8 20 23:36 Dockerfile
-rw-r--r--@ 1 natamox  staff    34K  8 20 23:36 LICENSE
-rw-r--r--@ 1 natamox  staff    64B  8 20 23:36 MANIFEST.in
-rw-r--r--@ 1 natamox  staff   2.4K  8 20 23:36 README.md
-rw-r--r--@ 1 natamox  staff   436B  8 20 23:36 requirements.txt
-rw-r--r--@ 1 natamox  staff   1.6K  8 20 23:36 setup.py
drwxr-xr-x@ 3 natamox  staff    96B  8 20 23:36 src
drwxr-xr-x@ 6 natamox  staff   192B  8 20 23:36 venv
➜  sotoki git:(main) source venv/bin/activate
(venv) ➜  sotoki git:(main) pip install -U pip
Requirement already satisfied: pip in ./venv/lib/python3.8/site-packages (23.0.1)
Collecting pip
  Using cached pip-23.2.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-23.2.1
(venv) ➜  sotoki git:(main) pip install -r requirements.txt
Collecting kiwixstorage<1.0,>=0.8.1 (from -r requirements.txt (line 1))
  Using cached kiwixstorage-0.8.3-py3-none-any.whl (36 kB)
Collecting pif<0.9,>=0.8.2 (from -r requirements.txt (line 2))
  Using cached pif-0.8.2-py2.py3-none-any.whl (19 kB)
Collecting zimscraperlib<3.0,>=2.1 (from -r requirements.txt (line 3))
  Using cached zimscraperlib-2.1.0-py3-none-any.whl (154 kB)
Collecting xml_to_dict<0.2,>=0.1.6 (from -r requirements.txt (line 4))
  Using cached xml_to_dict-0.1.6-py3-none-any.whl (3.6 kB)
Collecting cli-formatter<1.3,>=1.2.0 (from -r requirements.txt (line 5))
  Using cached cli_formatter-1.2.0-py3-none-any.whl (6.6 kB)
Collecting py7zr<0.21,>=0.20.4 (from -r requirements.txt (line 6))
  Obtaining dependency information for py7zr<0.21,>=0.20.4 from https://files.pythonhosted.org/packages/2c/da/155bb1f692c067b9213c9c7b8c19a012a65027399606d623a25dfb1d3af1/py7zr-0.20.6-py3-none-any.whl.metadata
  Using cached py7zr-0.20.6-py3-none-any.whl.metadata (16 kB)
Collecting python-slugify<9.0.0,>=8.0.1 (from -r requirements.txt (line 7))
  Using cached python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)
Collecting jinja2<3.2,>=3.1.0 (from -r requirements.txt (line 8))
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting redis!=4.5.2,<5.0,>=4.5.1 (from -r requirements.txt (line 9))
  Obtaining dependency information for redis!=4.5.2,<5.0,>=4.5.1 from https://files.pythonhosted.org/packages/20/2e/409703d645363352a20c944f5d119bdae3eb3034051a53724a7c5fee12b8/redis-4.6.0-py3-none-any.whl.metadata
  Using cached redis-4.6.0-py3-none-any.whl.metadata (8.3 kB)
Collecting beautifulsoup4<5.0,>=4.9.3 (from -r requirements.txt (line 10))
  Using cached beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
Collecting lxml<4.10,>=4.9.1 (from -r requirements.txt (line 11))
  Using cached lxml-4.9.3.tar.gz (3.6 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting jinja2-pluralize<0.4,>=0.3.0 (from -r requirements.txt (line 12))
  Using cached jinja2_pluralize-0.3.0-py2.py3-none-any.whl (4.8 kB)
Collecting tld<0.14,>=0.13 (from -r requirements.txt (line 13))
  Using cached tld-0.13-py2.py3-none-any.whl (263 kB)
Collecting mistune<3.0.0,>=2.0.5 (from -r requirements.txt (line 14))
  Using cached mistune-2.0.5-py2.py3-none-any.whl (24 kB)
Collecting python-dateutil<2.9,>=2.8.2 (from -r requirements.txt (line 15))
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting psutil<6.0,>=5.9.4 (from -r requirements.txt (line 16))
  Using cached psutil-5.9.5-cp38-abi3-macosx_11_0_arm64.whl (246 kB)
Collecting python-snappy<1.0,>=0.6.0 (from -r requirements.txt (line 17))
  Downloading python_snappy-0.6.1-cp38-cp38-macosx_10_9_universal2.whl (73 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.5/73.5 kB 797.5 kB/s eta 0:00:00
Collecting bidict<0.23,>=0.22.1 (from -r requirements.txt (line 18))
  Using cached bidict-0.22.1-py3-none-any.whl (35 kB)
Collecting cchardet<2.2,>=2.1.7 (from -r requirements.txt (line 19))
  Using cached cchardet-2.1.7.tar.gz (653 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting boto3<2,>=1.12.39 (from kiwixstorage<1.0,>=0.8.1->-r requirements.txt (line 1))
  Obtaining dependency information for boto3<2,>=1.12.39 from https://files.pythonhosted.org/packages/7c/f5/6b4302a6d34715d47f7d729e13cae7a2b896b7272561b1b1d4b714a0b76b/boto3-1.28.30-py3-none-any.whl.metadata
  Using cached boto3-1.28.30-py3-none-any.whl.metadata (6.7 kB)
Collecting requests<3.0,>=2.23 (from kiwixstorage<1.0,>=0.8.1->-r requirements.txt (line 1))
  Obtaining dependency information for requests<3.0,>=2.23 from https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl.metadata
  Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting aws-requests-auth<0.5,>=0.4.2 (from kiwixstorage<1.0,>=0.8.1->-r requirements.txt (line 1))
  Using cached aws_requests_auth-0.4.3-py2.py3-none-any.whl (6.8 kB)
Collecting argparse (from pif<0.9,>=0.8.2->-r requirements.txt (line 2))
  Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Collecting iso-639==0.4.5 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Using cached iso-639-0.4.5.tar.gz (167 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting colorthief==0.2.1 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Using cached colorthief-0.2.1-py2.py3-none-any.whl (6.1 kB)
Collecting python-resize-image<1.2,>=1.1.19 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Using cached python_resize_image-1.1.20-py2.py3-none-any.whl (8.4 kB)
Collecting Babel<3.0,>=2.9 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Using cached Babel-2.12.1-py3-none-any.whl (10.1 MB)
Collecting file-magic<0.5,>=0.4.0 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Using cached file_magic-0.4.1-py3-none-any.whl (6.3 kB)
Collecting libzim<3.0,>=2.1.0 (from zimscraperlib<3.0,>=2.1->-r requirements.txt (line 3))
  Using cached libzim-2.1.0.tar.gz (8.3 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [2 lines of output]
      [!] The libzim library cannot be found.
      Please verify it is correctly installed and can be found.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(venv) ➜  sotoki git:(main)
(venv) ➜  sotoki git:(main)
(venv) ➜  sotoki git:(main) python --version
Python 3.8.17
(venv) ➜  sotoki git:(main) pip --version
pip 23.2.1 from /Users/natamox/Desktop/OpenSource/sotoki/venv/lib/python3.8/site-packages/pip (python 3.8)
(venv) ➜  sotoki git:(main)
(venv) ➜  sotoki git:(main)
(venv) ➜  sotoki git:(main) python src/sotoki/dependencies.py
Traceback (most recent call last):
  File "src/sotoki/dependencies.py", line 6, in <module>
    from zimscraperlib.download import stream_file
ModuleNotFoundError: No module named 'zimscraperlib'
natamox commented 11 months ago

I should have figured out the problem, when I run it again on my x86 mac everything seems to be fine. So I think this may be because of the architecture, Intel x86 supports it, but the arm M1 of the Apple chip is not enough

benoit74 commented 11 months ago

@natamox this makes much more sense, thank you for testing again on x86 Having a more indepth look, sotoki is using zimscraperlib 2.1 which uses libzim 2.1 ; support for Mac Arm64 has been added in libzim 3.0 according to https://github.com/openzim/python-libzim/issues/164 (@rgaudin, do you confirm?) Sorry about that, will be solved once we upgrade to 3.x (https://github.com/openzim/sotoki/issues/290)

natamox commented 11 months ago

@benoit74 Thank you for your help! But I have another question for you, is the entry file for this project main.py? I tried to run the following code

python src/sotoki/__main__.py --help

Then it throws an error,on a machine running on the x86 architecture.

Traceback (most recent call last):
  File "src/sotoki/__main__.py", line 19, in <module>
    main()
  File "src/sotoki/__main__.py", line 13, in main
    from sotoki.entrypoint import main as entry
  File "/Users/natamox/Desktop/OpenSource/sotoki/src/sotoki/entrypoint.py", line 13, in <module>
    from .utils.shared import Global, logger
  File "/Users/natamox/Desktop/OpenSource/sotoki/src/sotoki/utils/shared.py", line 11, in <module>
    from zimscraperlib.zim.creator import Creator
  File "/Users/natamox/Desktop/OpenSource/sotoki/venv/lib/python3.8/site-packages/zimscraperlib/zim/__init__.py", line 16, in <module>
    from .creator import Creator
  File "/Users/natamox/Desktop/OpenSource/sotoki/venv/lib/python3.8/site-packages/zimscraperlib/zim/creator.py", line 30, in <module>
    from ..filesystem import delete_callback, get_content_mimetype, get_file_mimetype
  File "/Users/natamox/Desktop/OpenSource/sotoki/venv/lib/python3.8/site-packages/zimscraperlib/filesystem.py", line 13, in <module>
    import magic
  File "/Users/natamox/Desktop/OpenSource/sotoki/venv/lib/python3.8/site-packages/magic.py", line 70, in <module>
    _open = _libraries['magic'].magic_open
  File "/usr/local/Cellar/python@3.8/3.8.17_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/usr/local/Cellar/python@3.8/3.8.17_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(RTLD_DEFAULT, magic_open): symbol not found
benoit74 commented 11 months ago

Yes, this is the entry. Your error indicates that it does not find ligmagic. There are many requirements for proper zimscraperlib operations: https://github.com/openzim/python-scraperlib/#dependencies

rgaudin commented 11 months ago

I'm a bit surprised @kelson42 did not recognize the pattern. It's not the first ticket of this kind. Indeed sotoki doesn't support Apple M1 (because of its main dependency). Actually, we few scrapers does at the moment.

I advise you take a look at our Dockerfile, it will instruct you on whatever is needed and it's also a good way to use sotoki on unsupported platforms. We provide images (ghcr.io/openzim/sotoki) but you can build and run from any plaform running Docker.