huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Apache License 2.0
2.02k stars 143 forks source link

Dependency resolving issue installing from source #173

Open its5Q opened 6 months ago

its5Q commented 6 months ago

Hi, I have an issue installing this library from source on Windows. I've cloned the repo, created a Python 3.11 venv and ran pip install -e ".[dev]", but that basically gets stuck backtracking different dependency versions. Here's the output after letting it sit for about half an hour:

Obtaining file:///D:/Temp/datatrove
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Collecting dill>=0.3.0
  Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Collecting fsspec>=2023.12.2
  Using cached fsspec-2024.3.1-py3-none-any.whl (171 kB)
Collecting huggingface-hub>=0.17.0
  Using cached huggingface_hub-0.23.0-py3-none-any.whl (401 kB)
Collecting humanize
  Using cached humanize-4.9.0-py3-none-any.whl (126 kB)
Collecting loguru>=0.7.0
  Using cached loguru-0.7.2-py3-none-any.whl (62 kB)
Collecting multiprocess
  Using cached multiprocess-0.70.16-py311-none-any.whl (143 kB)
Collecting numpy>=1.25.0
  Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
Collecting tqdm
  Using cached tqdm-4.66.4-py3-none-any.whl (78 kB)
Collecting filelock
  Using cached filelock-3.14.0-py3-none-any.whl (12 kB)
Collecting packaging>=20.9
  Using cached packaging-24.0-py3-none-any.whl (53 kB)
Collecting pyyaml>=5.1
  Using cached PyYAML-6.0.1-cp311-cp311-win_amd64.whl (144 kB)
Collecting requests
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting typing-extensions>=3.7.4.3
  Using cached typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Collecting colorama>=0.3.4
  Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting win32-setctime>=1.0.0
  Using cached win32_setctime-1.1.0-py3-none-any.whl (3.6 kB)
Collecting ruff>=0.1.5
  Using cached ruff-0.4.3-py3-none-win_amd64.whl (8.4 MB)
Collecting pytest
  Using cached pytest-8.2.0-py3-none-any.whl (339 kB)
Collecting pytest-timeout
  Using cached pytest_timeout-2.3.1-py3-none-any.whl (14 kB)
Collecting pytest-xdist
  Using cached pytest_xdist-3.6.1-py3-none-any.whl (46 kB)
Collecting moto[s3,server]
  Using cached moto-5.0.6-py2.py3-none-any.whl (3.7 MB)
Collecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.3.2-cp311-cp311-win_amd64.whl (99 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.7-py3-none-any.whl (66 kB)
Collecting urllib3<3,>=1.21.1
  Using cached urllib3-2.2.1-py3-none-any.whl (121 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2024.2.2-py3-none-any.whl (163 kB)
Collecting rich
  Using cached rich-13.7.1-py3-none-any.whl (240 kB)
Collecting lighteval>=0.3.0
  Using cached lighteval-0.3.0-py3-none-any.whl (227 kB)
Collecting faust-cchardet
  Using cached faust_cchardet-2.1.19-cp311-cp311-win_amd64.whl (119 kB)
Collecting pyarrow
  Using cached pyarrow-16.0.0-cp311-cp311-win_amd64.whl (25.9 MB)
Collecting python-magic
  Using cached python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting warcio
  Using cached warcio-1.7.4-py2.py3-none-any.whl (40 kB)
Collecting datasets>=2.18.0
  Using cached datasets-2.19.0-py3-none-any.whl (542 kB)
Collecting fasttext-wheel
  Using cached fasttext_wheel-0.9.2-cp311-cp311-win_amd64.whl (232 kB)
Collecting nltk
  Using cached nltk-3.8.1-py3-none-any.whl (1.5 MB)
Collecting inscriptis
  Using cached inscriptis-2.5.0-py3-none-any.whl (45 kB)
Collecting tldextract
  Using cached tldextract-5.1.2-py3-none-any.whl (97 kB)
Collecting trafilatura>=1.8.0
  Using cached trafilatura-1.9.0-py3-none-any.whl (1.0 MB)
Collecting tokenizers
  Using cached tokenizers-0.19.1-cp311-none-win_amd64.whl (2.2 MB)
Collecting ftfy
  Using cached ftfy-6.2.0-py3-none-any.whl (54 kB)
Collecting fasteners
  Using cached fasteners-0.19-py3-none-any.whl (18 kB)
Collecting xxhash
  Using cached xxhash-3.4.1-cp311-cp311-win_amd64.whl (29 kB)
Collecting s3fs>=2023.12.2
  Using cached s3fs-2024.3.1-py3-none-any.whl (29 kB)
Collecting boto3>=1.9.201
  Using cached boto3-1.34.98-py3-none-any.whl (139 kB)
Collecting botocore>=1.14.0
  Using cached botocore-1.34.98-py3-none-any.whl (12.2 MB)
Collecting cryptography>=3.3.1
  Using cached cryptography-42.0.5-cp39-abi3-win_amd64.whl (2.9 MB)
Collecting xmltodict
  Using cached xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Collecting werkzeug!=2.2.0,!=2.2.1,>=0.5
  Using cached werkzeug-3.0.2-py3-none-any.whl (226 kB)
Collecting python-dateutil<3.0.0,>=2.1
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting responses>=0.15.0
  Using cached responses-0.25.0-py3-none-any.whl (55 kB)
Collecting Jinja2>=2.10.1
  Using cached Jinja2-3.1.3-py3-none-any.whl (133 kB)
Collecting py-partiql-parser==0.5.4
  Using cached py_partiql_parser-0.5.4-py2.py3-none-any.whl (23 kB)
Collecting antlr4-python3-runtime
  Using cached antlr4_python3_runtime-4.13.1-py3-none-any.whl (144 kB)
Collecting joserfc>=0.9.0
  Using cached joserfc-0.9.0-py3-none-any.whl (60 kB)
Collecting jsonpath-ng
  Using cached jsonpath_ng-1.6.1-py3-none-any.whl (29 kB)
Collecting docker>=3.0.0
  Using cached docker-7.0.0-py3-none-any.whl (147 kB)
Collecting graphql-core
  Using cached graphql_core-3.2.3-py3-none-any.whl (202 kB)
Collecting cfn-lint>=0.40.0
  Using cached cfn_lint-0.87.1-py3-none-any.whl (3.8 MB)
Collecting openapi-spec-validator>=0.5.0
  Using cached openapi_spec_validator-0.7.1-py3-none-any.whl (38 kB)
Collecting pyparsing>=3.0.7
  Using cached pyparsing-3.1.2-py3-none-any.whl (103 kB)
Collecting jsondiff>=1.1.2
  Using cached jsondiff-2.0.0-py3-none-any.whl (6.6 kB)
Collecting aws-xray-sdk!=0.96,>=0.93
  Using cached aws_xray_sdk-2.13.0-py2.py3-none-any.whl (101 kB)
Requirement already satisfied: setuptools in d:\temp\datatrove\dev\lib\site-packages (from moto[s3,server]->datatrove==0.2.0) (65.5.0)
Collecting flask!=2.2.0,!=2.2.1
  Using cached flask-3.0.3-py3-none-any.whl (101 kB)
Collecting flask-cors
  Using cached Flask_Cors-4.0.0-py2.py3-none-any.whl (14 kB)
Collecting iniconfig
  Using cached iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
Collecting pluggy<2.0,>=1.5
  Using cached pluggy-1.5.0-py3-none-any.whl (20 kB)
Collecting execnet>=2.1
  Using cached execnet-2.1.1-py3-none-any.whl (40 kB)
Collecting wrapt
  Using cached wrapt-1.16.0-cp311-cp311-win_amd64.whl (37 kB)
Collecting jmespath<2.0.0,>=0.7.1
  Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.11.0,>=0.10.0
  Using cached s3transfer-0.10.1-py3-none-any.whl (82 kB)
Collecting aws-sam-translator>=1.87.0
  Using cached aws_sam_translator-1.87.0-py3-none-any.whl (382 kB)
Collecting jsonpatch
  Using cached jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting jsonschema<5,>=3.0
  Using cached jsonschema-4.22.0-py3-none-any.whl (88 kB)
Collecting networkx<4,>=2.4
  Using cached networkx-3.3-py3-none-any.whl (1.7 MB)
Collecting junit-xml~=1.9
  Using cached junit_xml-1.9-py2.py3-none-any.whl (7.1 kB)
Collecting jschema-to-python~=1.2.3
  Using cached jschema_to_python-1.2.3-py3-none-any.whl (10 kB)
Collecting sarif-om~=1.0.4
  Using cached sarif_om-1.0.4-py3-none-any.whl (30 kB)
Collecting sympy>=1.0.0
  Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting regex>=2021.7.1
  Using cached regex-2024.4.28-cp311-cp311-win_amd64.whl (268 kB)
Collecting cffi>=1.12
  Using cached cffi-1.16.0-cp311-cp311-win_amd64.whl (181 kB)
Collecting pyarrow-hotfix
  Using cached pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)
Collecting pandas
  Using cached pandas-2.2.2-cp311-cp311-win_amd64.whl (11.6 MB)
Collecting aiohttp
  Using cached aiohttp-3.9.5-cp311-cp311-win_amd64.whl (370 kB)
Collecting pywin32>=304
  Using cached pywin32-306-cp311-cp311-win_amd64.whl (9.2 MB)
Collecting itsdangerous>=2.1.2
  Using cached itsdangerous-2.2.0-py3-none-any.whl (16 kB)
Collecting click>=8.1.3
  Using cached click-8.1.7-py3-none-any.whl (97 kB)
Collecting blinker>=1.6.2
  Using cached blinker-1.8.1-py3-none-any.whl (9.5 kB)
Collecting MarkupSafe>=2.0
  Using cached MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB)
Collecting transformers>=4.38.0
  Using cached transformers-4.40.1-py3-none-any.whl (9.0 MB)
Collecting torch>=2.0
  Using cached torch-2.3.0-cp311-cp311-win_amd64.whl (159.8 MB)
Collecting GitPython>=3.1.41
  Using cached GitPython-3.1.43-py3-none-any.whl (207 kB)
Collecting termcolor==2.3.0
  Using cached termcolor-2.3.0-py3-none-any.whl (6.9 kB)
Collecting pytablewriter
  Using cached pytablewriter-1.2.0-py3-none-any.whl (111 kB)
Collecting aenum==3.1.15
  Using cached aenum-3.1.15-py3-none-any.whl (137 kB)
Collecting scikit-learn
  Using cached scikit_learn-1.4.2-cp311-cp311-win_amd64.whl (10.6 MB)
Collecting spacy==3.7.2
  Using cached spacy-3.7.2-cp311-cp311-win_amd64.whl (12.1 MB)
Collecting sacrebleu
  Using cached sacrebleu-2.4.2-py3-none-any.whl (106 kB)
Collecting rouge-score==0.1.2
  Using cached rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... done
Collecting sentencepiece>=0.1.99
  Using cached sentencepiece-0.2.0-cp311-cp311-win_amd64.whl (991 kB)
Collecting protobuf==3.20.*
  Using cached protobuf-3.20.3-py2.py3-none-any.whl (162 kB)
Collecting pycountry
  Using cached pycountry-23.12.11-py3-none-any.whl (6.2 MB)
Collecting joblib
  Using cached joblib-1.4.2-py3-none-any.whl (301 kB)
Collecting absl-py
  Using cached absl_py-2.1.0-py3-none-any.whl (133 kB)
Collecting six>=1.14.0
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11
  Using cached spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0
  Using cached spacy_loggers-1.0.5-py3-none-any.whl (22 kB)
Collecting murmurhash<1.1.0,>=0.28.0
  Using cached murmurhash-1.0.10-cp311-cp311-win_amd64.whl (25 kB)
Collecting cymem<2.1.0,>=2.0.2
  Using cached cymem-2.0.8-cp311-cp311-win_amd64.whl (39 kB)
Collecting preshed<3.1.0,>=3.0.2
  Using cached preshed-3.0.9-cp311-cp311-win_amd64.whl (122 kB)
Collecting thinc<8.3.0,>=8.1.8
  Using cached thinc-8.2.3-cp311-cp311-win_amd64.whl (1.5 MB)
Collecting wasabi<1.2.0,>=0.9.1
  Using cached wasabi-1.1.2-py3-none-any.whl (27 kB)
Collecting srsly<3.0.0,>=2.4.3
  Using cached srsly-2.4.8-cp311-cp311-win_amd64.whl (479 kB)
Collecting catalogue<2.1.0,>=2.0.6
  Using cached catalogue-2.0.10-py3-none-any.whl (17 kB)
Collecting weasel<0.4.0,>=0.1.0
  Using cached weasel-0.3.4-py3-none-any.whl (50 kB)
Collecting typer<0.10.0,>=0.3.0
  Using cached typer-0.9.4-py3-none-any.whl (45 kB)
Collecting smart-open<7.0.0,>=5.2.1
  Using cached smart_open-6.4.0-py3-none-any.whl (57 kB)
Collecting pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4
  Using cached pydantic-2.7.1-py3-none-any.whl (409 kB)
Collecting langcodes<4.0.0,>=3.2.0
  Using cached langcodes-3.4.0-py3-none-any.whl (182 kB)
Collecting jsonschema-path<0.4.0,>=0.3.1
  Using cached jsonschema_path-0.3.2-py3-none-any.whl (14 kB)
Collecting lazy-object-proxy<2.0.0,>=1.7.1
  Using cached lazy_object_proxy-1.10.0-cp311-cp311-win_amd64.whl (27 kB)
Collecting openapi-schema-validator<0.7.0,>=0.6.0
  Using cached openapi_schema_validator-0.6.2-py3-none-any.whl (8.8 kB)
Collecting aiobotocore<3.0.0,>=2.5.4
  Using cached aiobotocore-2.12.3-py3-none-any.whl (76 kB)
Collecting courlan>=1.1.0
  Using cached courlan-1.1.0-py3-none-any.whl (33 kB)
Collecting htmldate>=1.8.1
  Using cached htmldate-1.8.1-py3-none-any.whl (31 kB)
Collecting justext>=3.0.0
  Using cached jusText-3.0.0-py2.py3-none-any.whl (837 kB)
Collecting lxml<5.2.0,>=4.9.4
  Using cached lxml-5.1.1-cp311-cp311-win_amd64.whl (3.9 MB)
Collecting pybind11>=2.2
  Using cached pybind11-2.12.0-py3-none-any.whl (234 kB)
Collecting wcwidth<0.3.0,>=0.2.12
  Using cached wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Collecting ply
  Using cached ply-3.11-py2.py3-none-any.whl (49 kB)
Collecting markdown-it-py>=2.2.0
  Using cached markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
Collecting pygments<3.0.0,>=2.13.0
  Using cached pygments-2.18.0-py3-none-any.whl (1.2 MB)
Collecting requests-file>=1.4
  Using cached requests_file-2.0.0-py2.py3-none-any.whl (4.2 kB)
Collecting aiobotocore<3.0.0,>=2.5.4
  Using cached aiobotocore-2.12.2-py3-none-any.whl (76 kB)
  Using cached aiobotocore-2.12.1-py3-none-any.whl (76 kB)
  Using cached aiobotocore-2.12.0-py3-none-any.whl (76 kB)
  Using cached aiobotocore-2.11.2-py3-none-any.whl (76 kB)
  Using cached aiobotocore-2.11.1-py3-none-any.whl (76 kB)
  Using cached aiobotocore-2.11.0-py3-none-any.whl (76 kB)
  Using cached aiobotocore-2.10.0-py3-none-any.whl (75 kB)
  Using cached aiobotocore-2.9.1-py3-none-any.whl (75 kB)
  Using cached aiobotocore-2.9.0-py3-none-any.whl (75 kB)
  Using cached aiobotocore-2.8.0-py3-none-any.whl (75 kB)
  Using cached aiobotocore-2.7.0-py3-none-any.whl (73 kB)
  Using cached aiobotocore-2.6.0-py3-none-any.whl (73 kB)
  Using cached aiobotocore-2.5.4-py3-none-any.whl (73 kB)
INFO: pip is looking at multiple versions of xxhash to determine which version is compatible with other requirements. This could take a while.
Collecting xxhash
  Using cached xxhash-3.3.0-cp311-cp311-win_amd64.whl (29 kB)
INFO: pip is looking at multiple versions of xmltodict to determine which version is compatible with other requirements. This could take a while.
Collecting xmltodict
  Using cached xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
INFO: pip is looking at multiple versions of warcio to determine which version is compatible with other requirements. This could take a while.
Collecting warcio
  Using cached warcio-1.7.3-py2.py3-none-any.whl (40 kB)
INFO: pip is looking at multiple versions of tokenizers to determine which version is compatible with other requirements. This could take a while.
Collecting tokenizers
  Using cached tokenizers-0.19.0-cp311-none-win_amd64.whl (2.2 MB)
INFO: pip is looking at multiple versions of tldextract to determine which version is compatible with other requirements. This could take a while.
Collecting tldextract
  Using cached tldextract-5.1.1-py3-none-any.whl (97 kB)
INFO: pip is looking at multiple versions of rich to determine which version is compatible with other requirements. This could take a while.
Collecting rich
  Using cached rich-13.7.0-py3-none-any.whl (240 kB)
INFO: pip is looking at multiple versions of python-magic to determine which version is compatible with other requirements. This could take a while.
Collecting python-magic
  Using cached python_magic-0.4.26-py2.py3-none-any.whl (13 kB)
INFO: pip is looking at multiple versions of moto to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of jsonpath-ng to determine which version is compatible with other requirements. This could take a while.
Collecting jsonpath-ng
  Using cached jsonpath_ng-1.6.0-py3-none-any.whl (29 kB)
INFO: pip is looking at multiple versions of inscriptis to determine which version is compatible with other requirements. This could take a while.
Collecting inscriptis
  Using cached inscriptis-2.4.0.1-py3-none-any.whl (41 kB)
INFO: pip is looking at multiple versions of iniconfig to determine which version is compatible with other requirements. This could take a while.
Collecting iniconfig
  Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
INFO: pip is looking at multiple versions of graphql-core to determine which version is compatible with other requirements. This could take a while.
Collecting graphql-core
  Using cached graphql_core-3.2.2-py3-none-any.whl (202 kB)
INFO: pip is looking at multiple versions of ftfy to determine which version is compatible with other requirements. This could take a while.
Collecting ftfy
  Using cached ftfy-6.1.3-py3-none-any.whl (53 kB)
INFO: pip is looking at multiple versions of flask-cors to determine which version is compatible with other requirements. This could take a while.
Collecting flask-cors
  Using cached Flask_Cors-3.0.10-py2.py3-none-any.whl (14 kB)
INFO: pip is looking at multiple versions of faust-cchardet to determine which version is compatible with other requirements. This could take a while.
Collecting faust-cchardet
  Using cached faust_cchardet-2.1.18-cp311-cp311-win_amd64.whl (110 kB)
INFO: pip is looking at multiple versions of moto to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of fasttext-wheel to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of fasteners to determine which version is compatible with other requirements. This could take a while.
Collecting fasteners
  Using cached fasteners-0.18-py3-none-any.whl (18 kB)
INFO: pip is looking at multiple versions of antlr4-python3-runtime to determine which version is compatible with other requirements. This could take a while.
Collecting antlr4-python3-runtime
  Using cached antlr4_python3_runtime-4.13.0-py3-none-any.whl (144 kB)
INFO: pip is looking at multiple versions of werkzeug to determine which version is compatible with other requirements. This could take a while.
Collecting werkzeug!=2.2.0,!=2.2.1,>=0.5
  Using cached werkzeug-3.0.1-py3-none-any.whl (226 kB)
INFO: pip is looking at multiple versions of trafilatura to determine which version is compatible with other requirements. This could take a while.
Collecting trafilatura>=1.8.0
  Using cached trafilatura-1.8.1-py3-none-any.whl (1.0 MB)
INFO: pip is looking at multiple versions of s3fs to determine which version is compatible with other requirements. This could take a while.
Collecting s3fs>=2023.12.2
  Using cached s3fs-2024.3.0-py3-none-any.whl (29 kB)
Collecting fsspec>=2023.12.2
  Using cached fsspec-2024.3.0-py3-none-any.whl (171 kB)
INFO: pip is looking at multiple versions of fsspec to determine which version is compatible with other requirements. This could take a while.
Collecting s3fs>=2023.12.2
  Using cached s3fs-2024.2.0-py3-none-any.whl (28 kB)
Collecting fsspec>=2023.12.2
  Using cached fsspec-2024.2.0-py3-none-any.whl (170 kB)
Collecting s3fs>=2023.12.2
  Using cached s3fs-2023.12.2-py3-none-any.whl (28 kB)
Collecting fsspec>=2023.12.2
  Using cached fsspec-2023.12.2-py3-none-any.whl (168 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
INFO: pip is looking at multiple versions of responses to determine which version is compatible with other requirements. This could take a while.
Collecting responses>=0.15.0
  Using cached responses-0.24.1-py3-none-any.whl (55 kB)
INFO: pip is looking at multiple versions of s3fs to determine which version is compatible with other requirements. This could take a while.
  Using cached responses-0.24.0-py3-none-any.whl (54 kB)
INFO: pip is looking at multiple versions of fsspec to determine which version is compatible with other requirements. This could take a while.
  Using cached responses-0.23.3-py3-none-any.whl (52 kB)
Collecting types-PyYAML
  Using cached types_PyYAML-6.0.12.20240311-py3-none-any.whl (15 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
Collecting responses>=0.15.0
  Using cached responses-0.23.2-py3-none-any.whl (52 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached responses-0.23.1-py3-none-any.whl (52 kB)
  Using cached responses-0.23.0-py3-none-any.whl (52 kB)
  Using cached responses-0.22.0-py3-none-any.whl (51 kB)
Collecting toml
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting types-toml
  Using cached types_toml-0.10.8.20240310-py3-none-any.whl (4.8 kB)
INFO: pip is looking at multiple versions of responses to determine which version is compatible with other requirements. This could take a while.
Collecting responses>=0.15.0
  Using cached responses-0.21.0-py3-none-any.whl (45 kB)
  Using cached responses-0.20.0-py3-none-any.whl (27 kB)
  Using cached responses-0.19.0-py3-none-any.whl (41 kB)
  Using cached responses-0.18.0-py3-none-any.whl (38 kB)
  Using cached responses-0.17.0-py2.py3-none-any.whl (38 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached responses-0.16.0-py2.py3-none-any.whl (35 kB)
  Using cached responses-0.15.0-py2.py3-none-any.whl (32 kB)
INFO: pip is looking at multiple versions of python-dateutil to determine which version is compatible with other requirements. This could take a while.
Collecting python-dateutil<3.0.0,>=2.1
  Using cached python_dateutil-2.9.0-py2.py3-none-any.whl (230 kB)
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
  Using cached python_dateutil-2.8.0-py2.py3-none-any.whl (226 kB)
  Using cached python_dateutil-2.7.5-py2.py3-none-any.whl (225 kB)
  Using cached python_dateutil-2.7.4-py2.py3-none-any.whl (211 kB)
  Using cached python_dateutil-2.7.3-py2.py3-none-any.whl (211 kB)
INFO: pip is looking at multiple versions of python-dateutil to determine which version is compatible with other requirements. This could take a while.
  Using cached python_dateutil-2.7.2-py2.py3-none-any.whl (212 kB)
  Using cached python_dateutil-2.7.1-py2.py3-none-any.whl (212 kB)
  Using cached python_dateutil-2.7.0-py2.py3-none-any.whl (207 kB)
  Using cached python_dateutil-2.6.1-py2.py3-none-any.whl (194 kB)
  Using cached python_dateutil-2.6.0-py2.py3-none-any.whl (194 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached python_dateutil-2.5.3-py2.py3-none-any.whl (201 kB)
  Using cached python_dateutil-2.5.2-py2.py3-none-any.whl (201 kB)
  Using cached python_dateutil-2.5.1-py2.py3-none-any.whl (200 kB)
  Using cached python_dateutil-2.5.0-py2.py3-none-any.whl (199 kB)
  Using cached python_dateutil-2.4.2-py2.py3-none-any.whl (188 kB)
  Using cached python-dateutil-2.4.1.post1.zip (212 kB)
  Preparing metadata (setup.py) ... done
Discarding https://files.pythonhosted.org/packages/73/c4/d9e410b1641e210262123f49619070e46da2a7ede334cf6b6fb3db5ee985/python-dateutil-2.4.1.post1.zip (from https://pypi.org/simple/python-dateutil/): Requested python-dateutil<3.0.0,>=2.1 from https://files.pythonhosted.org/packages/73/c4/d9e410b1641e210262123f49619070e46da2a7ede334cf6b6fb3db5ee985/python-dateutil-2.4.1.post1.zip (from moto->datatrove==0.2.0) has inconsistent version: expected '2.4.1.post1', but metadata has '2.4.1'
  Using cached python-dateutil-2.4.1.post1.tar.gz (207 kB)
  Preparing metadata (setup.py) ... done
Discarding https://files.pythonhosted.org/packages/9c/b0/5948496efa852dfa78751c3f494f57fa01bfc453b4a7b7b47b0c2e0b6a80/python-dateutil-2.4.1.post1.tar.gz (from https://pypi.org/simple/python-dateutil/): Requested python-dateutil<3.0.0,>=2.1 from https://files.pythonhosted.org/packages/9c/b0/5948496efa852dfa78751c3f494f57fa01bfc453b4a7b7b47b0c2e0b6a80/python-dateutil-2.4.1.post1.tar.gz (from moto->datatrove==0.2.0) has inconsistent version: expected '2.4.1.post1', but metadata has '2.4.1'
  Using cached python_dateutil-2.4.1-py2.py3-none-any.whl (188 kB)
  Using cached python_dateutil-2.4.0-py2.py3-none-any.whl (175 kB)
  Using cached python_dateutil-2.3-py2.py3-none-any.whl (173 kB)
  Using cached python-dateutil-2.2.tar.gz (259 kB)
  Preparing metadata (setup.py) ... done
  Using cached python-dateutil-2.1.tar.gz (152 kB)
  Preparing metadata (setup.py) ... done
INFO: pip is looking at multiple versions of pyparsing to determine which version is compatible with other requirements. This could take a while.
Collecting pyparsing>=3.0.7
  Using cached pyparsing-3.1.1-py3-none-any.whl (103 kB)
  Using cached pyparsing-3.1.0-py3-none-any.whl (102 kB)
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
  Using cached pyparsing-3.0.8-py3-none-any.whl (98 kB)
  Downloading pyparsing-3.0.7-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.0/98.0 kB 430.4 kB/s eta 0:00:00
INFO: pip is looking at multiple versions of pyarrow to determine which version is compatible with other requirements. This could take a while.
Collecting pyarrow
  Downloading pyarrow-15.0.2-cp311-cp311-win_amd64.whl (24.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.8/24.8 MB 43.7 MB/s eta 0:00:00
INFO: pip is looking at multiple versions of pyparsing to determine which version is compatible with other requirements. This could take a while.
  Downloading pyarrow-15.0.1-cp311-cp311-win_amd64.whl (24.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.8/24.8 MB 38.6 MB/s eta 0:00:00
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.

Update: the issue persists on Ubuntu under WSL with Python 3.10.

its5Q commented 6 months ago

Well, seems like an issue with the pip version. Only pip>=24 works fine, but even on 24 it has to go through a dozen versions of boto3 before succeding, maybe something could be done about that. Also, it would be useful to put in the README that you need pip>=24

its5Q commented 6 months ago

I have another question - is Windows even supported? The README says that the pipelines should be platform-agnostic, but running tests on my Windows install I see several tests failing because of hardcoded unix-style paths.

guipenedo commented 6 months ago

Hi, "platform-agnostic" was mostly in terms of different clouds/hpc software requiring only changes to the Executor code and not to anything on the pipeline side. That said, even though we do not normally test datatrove on windows, it should still work. Could you post the failed tests logs so that we can fix whatever hardcoded issues there might be?

its5Q commented 6 months ago

I've already uninstalled the library and cleaned up the environment on Windows, but I think it was this assert failing because of the unix paths in the defined constants