yongzhuo / Macropodus

自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要,文本相似度,科学计算器,中文数字阿拉伯数字(罗马数字)转换,中文繁简转换,拼音转换。tookit(tool) of NLP,CWS(chinese word segnment),POS(Part-Of-Speech Tagging),NER(name entity recognition),Find(new words discovery),Keyword(keyword extraction),Summarize(text summarization),Sim(text similarity),Calculate(scientific calculator),Chi2num(chinese number to arabic number)
https://blog.csdn.net/rensihui
MIT License
656 stars 93 forks source link

安装 Macropodus 库的时候,提示需要 tqdm == 4.31.1,但是其他库需要 tqdm 库版本更高,应该怎么解决? #23

Open diaojunxian opened 1 year ago

diaojunxian commented 1 year ago

安装的时候有警告

Installing collected packages: tqdm
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.65.0
    Uninstalling tqdm-4.65.0:
      Successfully uninstalled tqdm-4.65.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.12.0 requires tqdm>=4.62.1, but you have tqdm 4.31.1 which is incompatible.
huggingface-hub 0.15.1 requires tqdm>=4.42.1, but you have tqdm 4.31.1 which is incompatible.
papermill 2.4.0 requires tqdm>=4.32.2, but you have tqdm 4.31.1 which is incompatible.
ydata-profiling 4.1.2 requires tqdm<4.65,>=4.48.2, but you have tqdm 4.31.1 which is incompatible.

虽然安装好了,执行代码的时候,会提示 tqdm 被其他库依赖的时候,需要更高版本,导致代码无法正常编译执行。

yongzhuo commented 1 year ago

试试:先删掉该依赖库tqdm,单独安装好 macropodus后再装高版本的tqdm

diaojunxian commented 1 year ago

试试:先删掉该依赖库tqdm,单独安装好 macropodus后再装高版本的tqdm

已经尝试过,虽然可以安装上,但是运行的时候,还是会提示 macropodus 需要低版本的 tqdm

yongzhuo commented 1 year ago

不影响运行就行,其实该项目对哪个tqdm版本并无特殊需求。报错是包管理的要求,并不影响代码正常跑

diaojunxian commented 1 year ago

不影响运行就行,其实该项目对哪个tqdm版本并无特殊需求。报错是包管理的要求,并不影响代码正常跑

会强制抛出异常,代码无法运行,😔

yongzhuo commented 1 year ago

方便贴出来吗,python版本, 以及运行报错信息

diaojunxian commented 1 year ago

@yongzhuo

这里的报错是 tqdm 版本过高:tqdm == 4.65.0

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
2023-07-27 05:47:54,890 - seg_basic.py[line:19] - INFO: path of dict cache is /opt/conda/lib/python3.10/site-packages/macropodus/data/cache/macropodus.cache!
Traceback (most recent call last):
  File "/home/cloudadmin/test-sample/test_chatglm/chatglm-maths/chatglm_maths/t10_toy_trl_train_ppo.py", line 31, in <module>
    import macropodus
  File "/opt/conda/lib/python3.10/site-packages/macropodus/__init__.py", line 12, in <module>
    from macropodus.summarize import keyword, textrank, summarization
  File "/opt/conda/lib/python3.10/site-packages/macropodus/summarize/__init__.py", line 11, in <module>
    from macropodus.summarize.graph_base.textrank import TextRankSum, TextRankKey
  File "/opt/conda/lib/python3.10/site-packages/macropodus/summarize/graph_base/textrank.py", line 8, in <module>
    from macropodus.summarize.graph_base.textrank_word2vec import TextrankWord2vec
  File "/opt/conda/lib/python3.10/site-packages/macropodus/summarize/graph_base/textrank_word2vec.py", line 8, in <module>
    from macropodus.similarity.similarity_word2vec_char import SimW2vChar
  File "/opt/conda/lib/python3.10/site-packages/macropodus/similarity/__init__.py", line 8, in <module>
    from macropodus.similarity.similarity_word2vec_char import SimW2vChar
  File "/opt/conda/lib/python3.10/site-packages/macropodus/similarity/similarity_word2vec_char.py", line 8, in <module>
    from macropodus.base.word2vec import W2v
  File "/opt/conda/lib/python3.10/site-packages/macropodus/base/word2vec.py", line 11, in <module>
    import gensim
  File "/opt/conda/lib/python3.10/site-packages/gensim/__init__.py", line 5, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils  # noqa:F401
  File "/opt/conda/lib/python3.10/site-packages/gensim/corpora/__init__.py", line 12, in <module>
    from .dictionary import Dictionary  # noqa:F401
  File "/opt/conda/lib/python3.10/site-packages/gensim/corpora/dictionary.py", line 11, in <module>
    from collections import Mapping, defaultdict
ImportError: cannot import name 'Mapping' from 'collections' (/opt/conda/lib/python3.10/collections/__init__.py)

我降低 tqdm == 4.31.0

/home/cloudadmin/test-sample/test_chatglm/chatglm-maths
Traceback (most recent call last):
  File "/home/cloudadmin/test-sample/test_chatglm/chatglm-maths/chatglm_maths/t10_toy_trl_train_ppo.py", line 25, in <module>
    from trl import PPOConfig, AutoModelForCausalLMWithValueHead, create_reference_model
  File "/opt/conda/lib/python3.10/site-packages/trl/__init__.py", line 5, in <module>
    from .core import set_seed
  File "/opt/conda/lib/python3.10/site-packages/trl/core.py", line 23, in <module>
    from transformers import top_k_top_p_filtering
  File "/opt/conda/lib/python3.10/site-packages/transformers/__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "/opt/conda/lib/python3.10/site-packages/transformers/dependency_versions_check.py", line 16, in <module>
    from .utils.versions import require_version, require_version_core
  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/__init__.py", line 60, in <module>
    from .hub import (
  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 32, in <module>
    from huggingface_hub import (
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/__init__.py", line 311, in __getattr__
    submod = importlib.import_module(submod_path)
  File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 44, in <module>
    from ._commit_api import (
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/_commit_api.py", line 14, in <module>
    from tqdm.contrib.concurrent import thread_map
ModuleNotFoundError: No module named 'tqdm.contrib'
yongzhuo commented 1 year ago

很奇怪的bug,python10么。冲突还比较多,gensim==3.7.1是不能换的,你试试tqdm==4.49.0呢

diaojunxian commented 1 year ago

4.49.0

是 python 3.10

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 113 CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so... 2023-07-27 06:23:12,857 - seg_basic.py[line:19] - INFO: path of dict cache is /opt/conda/lib/python3.10/site-packages/macropodus/data/cache/macropodus.cache! Traceback (most recent call last): File "/home/cloudadmin/test-sample/test_chatglm/chatglm-maths/chatglm_maths/t10_toy_trl_train_ppo.py", line 31, in import macropodus File "/opt/conda/lib/python3.10/site-packages/macropodus/init.py", line 12, in from macropodus.summarize import keyword, textrank, summarization File "/opt/conda/lib/python3.10/site-packages/macropodus/summarize/init.py", line 11, in from macropodus.summarize.graph_base.textrank import TextRankSum, TextRankKey File "/opt/conda/lib/python3.10/site-packages/macropodus/summarize/graph_base/textrank.py", line 8, in from macropodus.summarize.graph_base.textrank_word2vec import TextrankWord2vec File "/opt/conda/lib/python3.10/site-packages/macropodus/summarize/graph_base/textrank_word2vec.py", line 8, in from macropodus.similarity.similarity_word2vec_char import SimW2vChar File "/opt/conda/lib/python3.10/site-packages/macropodus/similarity/init.py", line 8, in from macropodus.similarity.similarity_word2vec_char import SimW2vChar File "/opt/conda/lib/python3.10/site-packages/macropodus/similarity/similarity_word2vec_char.py", line 8, in from macropodus.base.word2vec import W2v File "/opt/conda/lib/python3.10/site-packages/macropodus/base/word2vec.py", line 11, in import gensim File "/opt/conda/lib/python3.10/site-packages/gensim/init.py", line 5, in from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401 File "/opt/conda/lib/python3.10/site-packages/gensim/corpora/init.py", line 12, in from .dictionary import Dictionary # noqa:F401 File "/opt/conda/lib/python3.10/site-packages/gensim/corpora/dictionary.py", line 11, in from collections import Mapping, defaultdict ImportError: cannot import name 'Mapping' from 'collections' (/opt/conda/lib/python3.10/collections/init.py)

yongzhuo commented 1 year ago

python10版本太高了,该项目只测过3.7-3.9。 把/opt/conda/lib/python3.10/site-packages/gensim/corpora/dictionary.py中 from collections import Mapping, defaultdict 改为 from collections.abc import Mapping from collections import defaultdict 试试呢

diaojunxian commented 1 year ago

python10版本太高了,该项目只测过3.7-3.9。 把/opt/conda/lib/python3.10/site-packages/gensim/corpora/dictionary.py中 from collections import Mapping, defaultdict 改为 from collections.abc import Mapping from collections import defaultdict 试试呢

我降低 python 版本到 3.9 可以执行了,但是还是有奇怪报错。

已经降低 peft == 0.2.0 transformers==4.28.0

ake sure to set config.batch_size to the correct value before training.
  warnings.warn(
tqdm:   0%|                                                                                                     | 0/37 [00:07<?, ?it/s]
Traceback (most recent call last):
  File "/home/cloudadmin/test-sample/test_chatglm/chatglm-maths/chatglm_maths/t10_toy_trl_train_ppo.py", line 210, in <module>
    response_text = tokenizer.decode(response_ids)
  File "/opt/conda/envs/3.9/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3486, in decode
    return self._decode(
  File "/home/cloudadmin/test-sample/test_chatglm/chatglm-maths/chatglm_maths/models/tokenization_chatglm.py", line 284, in _decode
    return super()._decode(token_ids, **kwargs)
  File "/opt/conda/envs/3.9/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 931, in _decode
    filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
  File "/opt/conda/envs/3.9/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 906, in convert_ids_to_tokens
    index = int(index)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
yongzhuo commented 1 year ago

那就是Bug, response_ids = response_tensor.detach().cpu().numpy().tolist()改为 response_ids = response_tensor.detach().cpu().numpy().tolist()[0]

model.cuda() 改为 model.half().cuda()

outputs = model(torch.cat([start_ids, end_ids], dim=-1))改为 outputs = model(torch.cat([start_ids, end_ids], dim=-1).cuda())

几个样例是能跑完的,日志: [tensor(0.1491, dtype=torch.float64)] tqdm: 97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 36/37 [05:17<00:08, 8.76s/it][115, 123, 119860, 5, 19052, 5, 123, 123, 119860, 194, 5, 123, 123, 5, 120, 115, 5, 115, 123, 115, 123, 5, 123, 115, 115, 115, 115, 123, 5, 123, 115, 355, 410, 410, 115, 5, 410, 5, 5, 115, 5, 115, 5327, 5327, 115, 5327, 5327, 5327, 5327, 5327, 5327, 5327, 115, 123, 115, 115, 5327, 5327, 5327, 115, 5327, 355, 64035, 5327, 65312, 5, 5327, 5327, 5327, 5327, 5327, 65312, 63917, 5327, 5, 5327, 5, 5327, 5, 65312, 5, 65312, 65312, 65312, 5, 63917, 5327, 5, 65312, 5, 5, 5, 5327, 5, 65312, 5, 65312, 5, 5, 115, 65312, 5, 5, 5, 174, 65312, 5, 5, 5, 5, 5327, 5, 5, 5, 5, 5327, 5, 5, 65312, 5, 5, 65312, 5, 5, 65312, 5, 5, 5] [tensor(0.1409, dtype=torch.float64)] tqdm: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [05:26<00:00, 8.82s/it] 2023-07-27 16:09:00,568 - t10_toy_trl_train_ppo.py[line:127] - INFO: **model_save_path is ./fine_tuning_t10/ppo/pytorch_model.bin**