yangheng95 / PyABSA

Sentiment Analysis, Text Classification, Text Augmentation, Text Adversarial defense, etc.;
https://pyabsa.readthedocs.io
MIT License
923 stars 159 forks source link

Checkpoints not found #134

Closed brieucdandin closed 2 years ago

brieucdandin commented 2 years ago

I recently could not load a checkpoint, although I could a few days ago... I tried reverting to the most recent past versions, and no luck whatsoever. Perhaps it is due to an update in the findfile library? I did not check if it had been updated recently, but in the case of the second test, it seems the checkpoints ZIP file is found in APCCheckpointManager.get_sentiment_classifier, but cannot be unzipped in the subsequently called unzip_checkpoint.

Sorry I could not dig deeper... Has anyone encountered a similar issue or has any idea what is going wrong? A PR would greatly be appreciated.

More info

That happened even after downloading the checkpoints from the cloud using the following:

# From available_checkpoints() at https://github.com/yangheng95/PyABSA/blob/release/pyabsa/functional/checkpoint/checkpoint_manager.py#L252
from google_drive_downloader import GoogleDriveDownloader as gdd

checkpoint_url = '1CBVGPA3xdQqdkFFwzO5T2Q4reFtzFIJZ'  # V2
if os.path.isfile('./checkpoints.json'):
  os.remove('./checkpoints.json')

gdd.download_file_from_google_drive(file_id=checkpoint_url, dest_path='./checkpoints.json')

(See end of post for print of all the checkpoints.)

First test: using shortcut checkpoint='english'

Command:

>>> sent_classifier = pyabsa.APCCheckpointManager.get_sentiment_classifier(checkpoint='english',
...                                                                        auto_device=True,  # Use CUDA if available
...                                                                       )

Traceback:

Downloading 1CBVGPA3xdQqdkFFwzO5T2Q4reFtzFIJZ into ./checkpoints.json... Done.
C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\functional\checkpoint\checkpoint_manager.py:266: ResourceWarning: unclosed <ssl.SSLSocket fd=3096, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('10.0.0.30', 64436), raddr=('172.217.13.142', 443)>
  gdd.download_file_from_google_drive(file_id=checkpoint_url, dest_path='./checkpoints.json')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\functional\checkpoint\checkpoint_manager.py:266: ResourceWarning: unclosed <ssl.SSLSocket fd=3352, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('10.0.0.30', 64437), raddr=('172.217.13.161', 443)>
  gdd.download_file_from_google_drive(file_id=checkpoint_url, dest_path='./checkpoints.json')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
********** Available APC model checkpoints for Version:1.8.24 (this version) **********
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/18Ijj2fJvAdPRv4_2vk3hZwr3P1ViSr7T/view?usp=sharing
Training Model: FAST-LSA-T
Training Dataset: English
Description: Trained on RTX2080 Ti
Available Version: 1.6.3+
Checkpoint File: fast_lsa_t_acc_84.84_f1_82.36.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/1TKM08i-u1oiyGOmGXIQ9jqbCLDvEt1oA/view?usp=sharing
Training Model: FAST-LCF-MDeBERTa
Training Dataset: Chinese
Description: Trained on RTX3090
Available Version: 1.8.2+
Checkpoint File: fast_lcf_bert_Chinese_acc_97.11_f1_96.54.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/1E3b3OSP4Kw8JNZySm-bynH9k4-nhlmx3/view?usp=sharing
Training Model: FAST-LCF-Deberta
Training Dataset: Multilingual
Description: Trained on RTX3090
Available Version: 1.8.2+
Checkpoint File: fast_lcf_bert_Multilingual_acc_94.72_f1_90.07.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id:
Description: You can help us by sharing checkpoints (e.g. models trained on you own datasets) with community.
Checkpoint File: PLEASE NOTE THAT THIS IS NOT A REAL CHECKPOINT!
Available Version:
----------------------------------------------------------------------------------------------------
There may be some checkpoints available for early versions of PyABSA, see ./checkpoints.json
Downloading checkpoint:english from Google Drive...
Notice: The pretrained model are used for testing, neither trained using fine-tuned hyper-parameters nor trained with enough steps, it is recommended to train the model on your own custom datasets
Downloading 1AsjRzMa2D6DykMuOvokMra9mGBdRgIJC into ./checkpoints\APC_ENGLISH_CHECKPOINT\any_model.zip...
0.0 B Done.
Unzipping...C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\google_drive_downloader\google_drive_downloader.py:78: UserWarning: Ignoring `unzip` since "1AsjRzMa2D6DykMuOvokMra9mGBdRgIJC" does not look like a valid zip file
  warnings.warn('Ignoring `unzip` since "{}" does not look like a valid zip file'.format(file_id))
C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\functional\checkpoint\checkpoint_manager.py:313: ResourceWarning: unclosed <ssl.SSLSocket fd=3400, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('10.0.0.30', 64439), raddr=('172.217.13.142', 443)>
  gdd.download_file_from_google_drive(file_id=archive_path,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Load sentiment classifier from ./checkpoints\APC_ENGLISH_CHECKPOINT
config: None
state_dict: None
model: None
tokenizer: None
Traceback (most recent call last):
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\core\apc\prediction\sentiment_classifier.py", line 64, in __init__
    self.opt = pickle.load(open(config_path, mode='rb'))
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\utils\pyabsa_utils.py", line 167, in decorated
    return f(*args, **kwargs)
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\functional\checkpoint\checkpoint_manager.py", line 71, in get_sentiment_classifier
    sent_classifier = SentimentClassifier(checkpoint, sentiment_map=sentiment_map, eval_batch_size=eval_batch_size)
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\core\apc\prediction\sentiment_classifier.py", line 117, in __init__
    raise RuntimeError('Fail to load the model from {}! \nException: {} '.format(e, model_arg))
RuntimeError: Fail to load the model from expected str, bytes or os.PathLike object, not NoneType!
Exception: ./checkpoints\APC_ENGLISH_CHECKPOINT

Second test: using the checkpoint's file name

... as well as the URL where to download it in case it is not found locally

Command:

>>> sent_classifier = pyabsa.APCCheckpointManager.get_sentiment_classifier(checkpoint=checkpoint_map['APC']['english']['Checkpoint File'],
...                                                                        from_drive_url=checkpoint_map['APC']['english']['id'],
...                                                                        auto_device=True,  # Use CUDA if available
...                                                                       )

Traceback:

Find zipped checkpoint: None, unzipping...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\utils\pyabsa_utils.py", line 167, in decorated
    return f(*args, **kwargs)
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\functional\checkpoint\checkpoint_manager.py", line 67, in get_sentiment_classifier
    checkpoint = unzip_checkpoint(checkpoint if os.path.exists(checkpoint) else find_file(os.getcwd(), checkpoint))
  File "C:\Users\bdandin\AppData\Roaming\Python\Python38\site-packages\pyabsa\functional\checkpoint\checkpoint_manager.py", line 29, in unzip_checkpoint
    with zipfile.ZipFile(zip_path, 'r') as z:
  File "C:\Program Files\Python38\lib\zipfile.py", line 1269, in __init__
    self._RealGetContents()
  File "C:\Program Files\Python38\lib\zipfile.py", line 1332, in _RealGetContents
    endrec = _EndRecData(fp)
  File "C:\Program Files\Python38\lib\zipfile.py", line 264, in _EndRecData
    fpin.seek(0, 2)
AttributeError: 'NoneType' object has no attribute 'seek'

All checkpoints available (before the tests)

Command:

checkpoint_map = pyabsa.available_checkpoints(from_local=True)

Output:

********** Available APC model checkpoints for Version:1.8.24 (this version) **********
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/18Ijj2fJvAdPRv4_2vk3hZwr3P1ViSr7T/view?usp=sharing
Training Model: FAST-LSA-T
Training Dataset: English
Description: Trained on RTX2080 Ti
Available Version: 1.6.3+
Checkpoint File: fast_lsa_t_acc_84.84_f1_82.36.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/1TKM08i-u1oiyGOmGXIQ9jqbCLDvEt1oA/view?usp=sharing
Training Model: FAST-LCF-MDeBERTa
Training Dataset: Chinese
Description: Trained on RTX3090
Available Version: 1.8.2+
Checkpoint File: fast_lcf_bert_Chinese_acc_97.11_f1_96.54.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/1E3b3OSP4Kw8JNZySm-bynH9k4-nhlmx3/view?usp=sharing
Training Model: FAST-LCF-Deberta
Training Dataset: Multilingual
Description: Trained on RTX3090
Available Version: 1.8.2+
Checkpoint File: fast_lcf_bert_Multilingual_acc_94.72_f1_90.07.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id:
Description: You can help us by sharing checkpoints (e.g. models trained on you own datasets) with community.
Checkpoint File: PLEASE NOTE THAT THIS IS NOT A REAL CHECKPOINT!
Available Version:
----------------------------------------------------------------------------------------------------
********** Available ATEPC model checkpoints for Version:1.8.24 (this version) **********
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/1q4lvoLK5nYKyrrHuO6ShbDnxDAX95Vfi/view?usp=sharing
Training Model: FAST-LCF-ATEPC
Training Dataset: English
Description: Trained on RTX3090
Available Version: 1.8.4+
Checkpoint File: fast_lcf_atepc_English_cdw_apcacc_80.16_apcf1_78.34_atef1_75.39.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/1OKIkkaGvKBGxlQo86qLexP2ulshakdR-/view?usp=sharing
Training Model: FAST-LCF-DeBERTa
Training Dataset: Chinese
Description: Trained on RTX3090
Available Version: 1.8.4+
Checkpoint File: fast_lcf_atepc_Chinese_cdw_apcacc_96.69_apcf1_96.25_atef1_92.26.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id: https://drive.google.com/file/d/10nrHmmNjzg7plSpentWnNX3kfM-m1RJB/view?usp=sharing
Training Model: FAST-LCF-ATEPC
Training Dataset: ABSADatasets.Multilingual
Description: Trained on RTX3090
Available Version: 1.8.4+
Checkpoint File: fast_lcf_atepc_Multilingual_cdw_apcacc_79.61_apcf1_76.24_atef1_63.29.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
id:
Description: You can help us by sharing checkpoints (e.g. models trained on you own datasets) with community.
Checkpoint File: PLEASE NOTE THAT THIS IS NOT A REAL CHECKPOINT!
Available Version:
----------------------------------------------------------------------------------------------------
********** Available TC model checkpoints for Version:1.8.24 (this version) **********
----------------------------------------------------------------------------------------------------
id:
Description: You can help us by sharing checkpoints (e.g. models trained on you own datasets) with community.
Checkpoint File: PLEASE NOTE THAT THIS IS NOT A REAL CHECKPOINT!
Available Version:
----------------------------------------------------------------------------------------------------
There may be some checkpoints available for early versions of PyABSA, see ./checkpoints.json
yangheng95 commented 2 years ago

The problem probably comes from G drive, as the same problem occured before. P lease try to download the checkpoint using your broswer or G drive clinet, and put it in CWD, feel free to futher report this problem, thanks

yangheng95 commented 2 years ago

121

yangheng95 commented 2 years ago

Hi, have you solved this problem?

yangheng95 commented 2 years ago

@brieucdandin

yangheng95 commented 2 years ago

Fixed