modelscope / AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
Apache License 2.0
412 stars 38 forks source link

[Bug] error in typing_metric.py #24

Open averieso opened 1 year ago

averieso commented 1 year ago

Checklist before your report.

What happened?

error occurred during the evaluation phase of the training script for entity typing.

Python traceback

show/hide

``` Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/averie/name-entity-recognition/experiments/adaseq/scripts/train.py", line 38, in train_model_from_args(args) File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/commands/train.py", line 84, in train_model_from_args train_model( File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/commands/train.py", line 164, in train_model trainer.train(checkpoint_path) File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/training/default_trainer.py", line 146, in train return super().train(checkpoint_path=checkpoint_path, *args, **kwargs) File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 689, in train self.train_loop(self.train_dataloader) File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 1220, in train_loop self.invoke_hook(TrainerStages.after_train_epoch) File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 1372, in invoke_hook getattr(hook, fn_name)(self) File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/hooks/evaluation_hook.py", line 54, in after_train_epoch self.do_evaluate(trainer) File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/hooks/evaluation_hook.py", line 67, in do_evaluate eval_res = trainer.evaluate() File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 778, in evaluate metric_values = self.evaluation_loop(self.eval_dataloader, File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 1272, in evaluation_loop metric_values = single_gpu_test( File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/utils/inference.py", line 56, in single_gpu_test evaluate_batch(trainer, data, metric_classes, vis_closure) File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/utils/inference.py", line 183, in evaluate_batch metric_cls.add(batch_result, data) File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/metrics/typing_metric.py", line 128, in add pred_results.append(one_hot_to_list(predicts[i][j])) File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/metrics/typing_metric.py", line 123, in one_hot_to_list id_list = set((np.where(in_tensor.detach().cpu() == 1)[0])) AttributeError: 'set' object has no attribute 'detach' ```

Operating system

Ubuntu 22.04.2 LTS

Python version

3.10.6

Output of pip freeze

show/hide

``` addict==2.4.0 aiohttp==3.8.4 aiosignal==1.3.1 aliyun-python-sdk-core==2.13.36 aliyun-python-sdk-kms==2.16.1 async-timeout==4.0.2 attrs==23.1.0 certifi==2023.5.7 cffi==1.15.1 charset-normalizer==3.1.0 cmake==3.26.3 crcmod==1.7 cryptography==41.0.1 datasets==2.8.0 dill==0.3.6 einops==0.6.1 filelock==3.12.0 frozenlist==1.3.3 fsspec==2023.5.0 gast==0.5.4 huggingface-hub==0.15.1 idna==3.4 Jinja2==3.1.2 jmespath==0.10.0 joblib==1.2.0 lit==16.0.5 MarkupSafe==2.1.2 modelscope==1.6.0 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.14 networkx==3.1 numpy==1.22.0 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oss2==2.18.0 packaging==23.1 pandas==1.5.3 Pillow==9.5.0 pyarrow==12.0.0 pycparser==2.21 pycryptodome==3.18.0 python-dateutil==2.8.2 pytz==2023.3 PyYAML==6.0 regex==2023.5.5 requests==2.31.0 responses==0.18.0 scikit-learn==1.2.2 scipy==1.10.1 seqeval==1.2.2 simplejson==3.19.1 six==1.16.0 sortedcontainers==2.4.0 sympy==1.12 threadpoolctl==3.1.0 tokenizers==0.13.3 tomli==2.0.1 torch==1.13.1 torchvision==0.14.1 tqdm==4.65.0 transformers==4.29.2 triton==2.0.0 typing_extensions==4.6.3 urllib3==2.0.2 xxhash==3.2.0 yapf==0.33.0 yarl==1.9.2 ```

How to reproduce

show/hide

``` python3 -m scripts.train -c examples/NPCRF/configs/ufet_concat_npcrf.yaml ```

Code of Conduct

jeffchy commented 1 year ago

Hi, could you please provide more details about this issue? e.g., the config file you run, the environments, and screenshots. These will help us find the problem, thanks.

averieso commented 1 year ago

Please see below for the config file (only the source and target emb files are changed according to the instruction). I'm not sure what information I should provide regarding the environment (except pip freeze above) and screenshots? Thanks.

config file: experiment: exp_dir: experiments/ exp_name: ufet seed: 17

task: entity-typing

dataset: data_file: train: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=train.json' valid: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=dev.json' test: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=test.json' tokenizer: blank lower: true labels: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=labels.txt'

preprocessor: type: multilabel-concat-typing-preprocessor model_dir: roberta-large max_length: 150

data_collator: MultiLabelConcatTypingDataCollatorWithPadding

model: type: multilabel-concat-typing-model embedder: model_name_or_path: roberta-large drop_special_tokens: false dropout: 0 decoder: type: pairwise-crf label_emb_type: glove label_emb_dim: 300 source_emb_file_path: None target_emb_dir: /home/averie/name-entity-recognition/experiments/adaseq/glove_embeds # TODO target_emb_name: glove.300.emb pairwise_factor: 70 mfvi_iteration: 4 two_potential: false sign_trick: true loss_function: WBCE pos_weight: 4

train: max_epochs: 30 dataloader: batch_size_per_gpu: 4 optimizer: type: AdamW lr: 2.0e-5 lr_scheduler: type: cosine warmup_rate: 0.1 # when choose concat typing model, default to use cosine_linear_with_warmup options: by_epoch: false hooks:

evaluation: dataloader: batch_size_per_gpu: 32 metrics: typing-metric

jeffchy commented 1 year ago

could you successfully run the default npcrf example?

averieso commented 1 year ago

how do i run the default example? it requires PATH_TO_DIR to be replaced, which is what I did.

jeffchy commented 1 year ago
decoder:
    type: pairwise-crf
    label_emb_type: glove
    label_emb_dim: 300
    source_emb_file_path: ${PATH_TO_DIR}/glove.6B.300d.txt
    target_emb_dir: ${PATH_TO_DIR}  # TODO
    target_emb_name: glove.300.emb
    pairwise_factor: 70
    mfvi_iteration: 4
    two_potential: false
    sign_trick: true

It seems that your configuration is incorrect, the above shows the default configuration. The glove path can be downloaded from the official stanford website: https://nlp.stanford.edu/data/glove.6B.zip. The source_emb_file_path should be the absolute path to for example the glove.6B.300d.txt, and the target_emb_dir, is the directory that you want to store the label embedding matrix named with target_emb_name. In other word, the label embedding is preprocessed from ${YOUR_SRC_EMB_DIR}/glove.6B.300d.txt, and saved to ${YOUR_TGT_EMB_SAVE_DIR}/glove.300.emb

averieso commented 1 year ago

thank you for your answer. according to the readme in the NPCRF directory:

"NPCRF requires static label embeddings, the preprocessed label embeddings (from GloVe for EN, Tencent for ZH) can be downloaded here: UFET, CFET, and you can place them in yoru folder and run the following config: (you need to reset your target_emb_dir in the config). Or you can provide the path of the glove embedding file (e.g., /path/to/your/glove.6B.300d.txt) and the code will generate label embedding for you."

so i cannot use the glove.300.emb given in this description?

jeffchy commented 1 year ago

Could you please give a screenshot of the error message? And, can you successfully run the model when you create embedding from the glove source?

averieso commented 1 year ago
Screenshot 2023-06-05 at 16 11 34

I just tried the glove source, it resulted in the same error (see screenshot)

jeffchy commented 1 year ago

oops, it seems that the bug is caused by the latest update of adaseq in the typing metric. No problem occurs in the training and loading label embeddings, a quick fix could be downgrading the adaseq to 0.6.2 and modelscope to 1.4.2. We will fix the bug later.

averieso commented 1 year ago

thanks for the reply. i did pip install adaseq==0.6.2 and pip install modelscope==1.4.2 but still getting the same error. am I missing something?