tomaarsen / SpanMarkerNER

SpanMarker for Named Entity Recognition
https://tomaarsen.github.io/SpanMarkerNER/
Apache License 2.0
368 stars 26 forks source link

Possible to load your own trained models with internet disabled? #52

Open justin-donnelly opened 5 months ago

justin-donnelly commented 5 months ago

Was wondering if there is a way to load a model in a kaggle notebook that I trained myself. There's currently a NER competition going on, and I wanted to try using the SpanMarker library to compete. Training went fine, but now to submit, I need to have the kaggle notebook have internet disabled. When trying to load my checkpoint, I get this error:

model_checkpoint = "/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000" model = SpanMarkerModel.from_pretrained(model_checkpoint,local_files_only = True, labels = [ '1-EMAIL', '1-ID_NUM', '1-NAME_STUDENT', '1-PHONE_NUM', '1-STREET_ADDRESS', '1-URL_PERSONAL', '1-USERNAME', '2-ID_NUM', '2-NAME_STUDENT', '2-PHONE_NUM', '2-STREET_ADDRESS', '2-URL_PERSONAL', 'O' ])

OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Kaggle notebook here: https://www.kaggle.com/jdonnelly0804/pii-infer

tomaarsen commented 5 months ago

Hello!

I'm struggling to reproduce the issue - I seem to be able to load a local model without internet. I can't reproduce it with your notebook either, as the model is private. Could you provide a larger stack trace perhaps? It seems like it wants to load the config for bert-base-uncased, which seems a bit odd as this is likely the original non-spanmarker model; I'm not sure why it wants to load this one.

justin-donnelly commented 5 months ago

Thank you for the quick response! I've made the dataset public, so you can use it now when trying to reproduce the error. Also, here is the link to the training notebook as well: https://www.kaggle.com/code/jdonnelly0804/pii-train - Training notebook its ok for the internet to be on, and everything goes smoothly, I simply save a checkpoint from the trained model, download the files, upload it as a dataset, and use it in the inference notebook that you we're trying to recreate (that was the hidden dataset)

Finally, here is the full traceback:

model_checkpoint = "/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000" model = SpanMarkerModel.from_pretrained(model_checkpoint,local_files_only = True, labels = [ '1-EMAIL', '1-ID_NUM', '1-NAME_STUDENT', '1-PHONE_NUM', '1-STREET_ADDRESS', '1-URL_PERSONAL', '1-USERNAME', '2-ID_NUM', '2-NAME_STUDENT', '2-PHONE_NUM', '2-STREET_ADDRESS', '2-URL_PERSONAL', 'O' ])

gaierror Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self) 173 try: --> 174 conn = connection.create_connection( 175 (self._dns_host, self.port), self.timeout, **extra_kw 176 ) 178 except SocketTimeout:

File /opt/conda/lib/python3.10/site-packages/urllib3/util/connection.py:72, in create_connection(address, timeout, source_address, socket_options) 68 return six.raise_from( 69 LocationParseError(u"'%s', label empty or too long" % host), None 70 ) ---> 72 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): 73 af, socktype, proto, canonname, sa = res

File /opt/conda/lib/python3.10/socket.py:955, in getaddrinfo(host, port, family, type, proto, flags) 954 addrlist = [] --> 955 for res in _socket.getaddrinfo(host, port, family, type, proto, flags): 956 af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 702 # Make the request on the httplib connection object. --> 703 httplib_response = self._make_request( 704 conn, 705 method, 706 url, 707 timeout=timeout_obj, 708 body=body, 709 headers=headers, 710 chunked=chunked, 711 ) 713 # If we're going to release the connection in finally:, then 714 # the response doesn't need to know about the connection. Otherwise 715 # it will also try to release it and we'll have a double-release 716 # mess.

File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 385 try: --> 386 self._validate_conn(conn) 387 except (SocketTimeout, BaseSSLError) as e: 388 # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.

File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1042, in HTTPSConnectionPool._validate_conn(self, conn) 1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock -> 1042 conn.connect() 1044 if not conn.is_verified:

File /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:363, in HTTPSConnection.connect(self) 361 def connect(self): 362 # Add certificate verification --> 363 self.sock = conn = self._new_conn() 364 hostname = self.host

File /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:186, in HTTPConnection._new_conn(self) 185 except SocketError as e: --> 186 raise NewConnectionError( 187 self, "Failed to establish a new connection: %s" % e 188 ) 190 return conn

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x783edf828070>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/requests/adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 485 try: --> 486 resp = conn.urlopen( 487 method=request.method, 488 url=url, 489 body=request.body, 490 headers=request.headers, 491 redirect=False, 492 assert_same_host=False, 493 preload_content=False, 494 decode_content=False, 495 retries=self.max_retries, 496 timeout=timeout, 497 chunked=chunked, 498 ) 500 except (ProtocolError, OSError) as err:

File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 785 e = ProtocolError("Connection aborted.", e) --> 787 retries = retries.increment( 788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] 789 ) 790 retries.sleep()

File /opt/conda/lib/python3.10/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace) 591 if new_retry.is_exhausted(): --> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x783edf828070>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1238, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint) 1237 try: -> 1238 metadata = get_hf_file_metadata( 1239 url=url, 1240 token=token, 1241 proxies=proxies, 1242 timeout=etag_timeout, 1243 library_name=library_name, 1244 library_version=library_version, 1245 user_agent=user_agent, 1246 ) 1247 except EntryNotFoundError as http_error: 1248 # Cache the non-existence of the file and raise

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 118 return fn(args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1631, in get_hf_file_metadata(url, token, proxies, timeout, library_name, library_version, user_agent) 1630 # Retrieve metadata -> 1631 r = _request_wrapper( 1632 method="HEAD", 1633 url=url, 1634 headers=headers, 1635 allow_redirects=False, 1636 follow_relative_redirects=True, 1637 proxies=proxies, 1638 timeout=timeout, 1639 ) 1640 hf_raise_for_status(r)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:385, in _request_wrapper(method, url, follow_relative_redirects, params) 384 if follow_relative_redirects: --> 385 response = _request_wrapper( 386 method=method, 387 url=url, 388 follow_relative_redirects=False, 389 params, 390 ) 392 # If redirection, we redirect only relative paths. 393 # This is useful in case of a renamed repository.

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:408, in _request_wrapper(method, url, follow_relative_redirects, params) 407 # Perform request and return if status_code is not in the retry list. --> 408 response = get_session().request(method=method, url=url, params) 409 hf_raise_for_status(response)

File /opt/conda/lib/python3.10/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 588 send_kwargs.update(settings) --> 589 resp = self.send(prep, **send_kwargs) 591 return resp

File /opt/conda/lib/python3.10/site-packages/requests/sessions.py:703, in Session.send(self, request, kwargs) 702 # Send the request --> 703 r = adapter.send(request, kwargs) 705 # Total elapsed time of the request (approximately)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_http.py:67, in UniqueRequestIdAdapter.send(self, request, *args, *kwargs) 66 try: ---> 67 return super().send(request, args, **kwargs) 68 except requests.RequestException as e:

File /opt/conda/lib/python3.10/site-packages/requests/adapters.py:519, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 517 raise SSLError(e, request=request) --> 519 raise ConnectionError(e, request=request) 521 except ClosedPoolError as e:

ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x783edf828070>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))"), '(Request ID: 36d0930c-5de6-410e-962b-fee0e9868975)')

The above exception was the direct cause of the following exception:

LocalEntryNotFoundError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:389, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 387 try: 388 # Load from URL or cache if already cached --> 389 resolved_file = hf_hub_download( 390 path_or_repo_id, 391 filename, 392 subfolder=None if len(subfolder) == 0 else subfolder, 393 repo_type=repo_type, 394 revision=revision, 395 cache_dir=cache_dir, 396 user_agent=user_agent, 397 force_download=force_download, 398 proxies=proxies, 399 resume_download=resume_download, 400 token=token, 401 local_files_only=local_files_only, 402 ) 403 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 118 return fn(args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1371, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint) 1369 else: 1370 # Otherwise: most likely a connection issue or Hub downtime => let's warn the user -> 1371 raise LocalEntryNotFoundError( 1372 "An error happened while trying to locate the file on the Hub and we cannot find the requested files" 1373 " in the local cache. Please check your connection and try again or make sure your Internet connection" 1374 " is on." 1375 ) from head_call_error 1377 # From now on, etag and commit_hash are not None.

LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

OSError Traceback (most recent call last) Cell In[11], line 2 1 model_checkpoint = "/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000" ----> 2 model = SpanMarkerModel.from_pretrained(model_checkpoint,local_files_only = True, 3 labels = [ 4 '1-EMAIL', '1-ID_NUM', '1-NAME_STUDENT', '1-PHONE_NUM', '1-STREET_ADDRESS', 5 '1-URL_PERSONAL', '1-USERNAME', '2-ID_NUM', '2-NAME_STUDENT', '2-PHONE_NUM', 6 '2-STREET_ADDRESS', '2-URL_PERSONAL', 'O' 7 ]) 9 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 11 model.to(device);

File /opt/conda/lib/python3.10/site-packages/span_marker/modeling.py:269, in SpanMarkerModel.from_pretrained(cls, pretrained_model_name_or_path, labels, config, model_card_data, *model_args, kwargs) 262 if parse(model_span_marker_version) < Version("1.0.0.dev"): 263 logger.warning( 264 f"Loading a model trained using SpanMarker v{model_span_marker_version}," 265 f" while SpanMarker v{span_marker_version} is installed. Due to large changes" 266 " introduced in v1.0.0, this is not recommended. Either retrain your model for" 267 f" v{span_marker_version}, or install span_marker < 1.0.0." 268 ) --> 269 model = super().from_pretrained( 270 pretrained_model_name_or_path, *model_args, config=config, *kwargs, model_card_data=model_card_data 271 ) 273 # If 'pretrained_model_name_or_path' refers to an encoder (roberta, bert, distilbert, electra, etc.), 274 # then initialize it and create the SpanMarker config and model using the encoder and its config. 275 else: 276 encoder = SpanMarkerModel._load_encoder_with_kwargs( 277 pretrained_model_name_or_path, config, model_args, kwargs.copy() 278 )

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3462, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, *kwargs) 3456 config = cls._autoset_attn_implementation( 3457 config, use_flash_attention_2=use_flash_attention_2, torch_dtype=torch_dtype, device_map=device_map 3458 ) 3460 with ContextManagers(init_contexts): 3461 # Let's make sure we don't run the init function of buffer modules -> 3462 model = cls(config, model_args, **model_kwargs) 3464 # make sure we use the model's config since the init call might have copied it 3465 config = model.config

File /opt/conda/lib/python3.10/site-packages/span_marker/modeling.py:80, in SpanMarkerModel.init(self, config, encoder, model_card_data, kwargs) 71 # encoder will be specified if this Model is initializer via .from_pretrained with an encoder 72 # If .from_pretrained is called with a SpanMarkerModel instance, then we use the "traditional" 73 # PreTrainedModel.from_pretrained, which won't include an encoder keyword argument. In that case, 74 # we must create an "empty" encoder for PreTrainedModel.from_pretrained to fill with the correct 75 # weights. 76 if encoder is None: 77 # Load the encoder via the Config to prevent having to use AutoModel.from_pretrained, which 78 # could load e.g. all of roberta-large from the Hub unnecessarily. 79 # However, use the SpanMarkerModel updated vocab_size ---> 80 encoder_config = AutoConfig.from_pretrained(self.config.encoder["_name_or_path"], self.config.encoder) 81 encoder = AutoModel.from_config(encoder_config) 82 self.encoder = encoder

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1082, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, kwargs) 1079 trust_remote_code = kwargs.pop("trust_remote_code", None) 1080 code_revision = kwargs.pop("code_revision", None) -> 1082 config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) 1083 has_remote_code = "auto_map" in config_dict and "AutoConfig" in config_dict["auto_map"] 1084 has_local_code = "model_type" in config_dict and config_dict["model_type"] in CONFIG_MAPPING

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:644, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, kwargs) 642 original_kwargs = copy.deepcopy(kwargs) 643 # Get config dict associated with the base config file --> 644 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) 645 if "_commit_hash" in config_dict: 646 original_kwargs["_commit_hash"] = config_dict["_commit_hash"]

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:699, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 695 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME) 697 try: 698 # Load from local folder or from cache or download from model Hub and cache --> 699 resolved_config_file = cached_file( 700 pretrained_model_name_or_path, 701 configuration_file, 702 cache_dir=cache_dir, 703 force_download=force_download, 704 proxies=proxies, 705 resume_download=resume_download, 706 local_files_only=local_files_only, 707 token=token, 708 user_agent=user_agent, 709 revision=revision, 710 subfolder=subfolder, 711 _commit_hash=commit_hash, 712 ) 713 commit_hash = extract_commit_hash(resolved_config_file, commit_hash) 714 except EnvironmentError: 715 # Raise any environment error raise by cached_file. It will have a helpful error message adapted to 716 # the original exception.

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:429, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 427 if not _raise_exceptions_for_missing_entries or not _raise_exceptions_for_connection_errors: 428 return None --> 429 raise EnvironmentError( 430 f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this file, couldn't find it in the" 431 f" cached files and it looks like {path_or_repo_id} is not the path to a directory containing a file named" 432 f" {full_filename}.\nCheckout your internet connection or see how to run the library in offline mode at" 433 " 'https://huggingface.co/docs/transformers/installation#offline-mode style="color:rgb(175,0,0)">'." 434 ) from e 435 except EntryNotFoundError as e: 436 if not _raise_exceptions_for_missing_entries:

OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

tomaarsen commented 5 months ago

That's very useful information right there! I see the issue now. This should fix it for you:


import json
from typing import Optional
from transformers import PreTrainedModel, BertModel, BertConfig
from span_marker import SpanMarkerModel
from span_marker.configuration import SpanMarkerConfig
from span_marker.model_card import SpanMarkerModelCardData

class BertSpanMarkerModel(SpanMarkerModel):
    def __init__(
        self,
        config: SpanMarkerConfig,
        encoder: Optional[PreTrainedModel] = None,
        model_card_data: Optional[SpanMarkerModelCardData] = None,
        **kwargs,
    ) -> None:
        if encoder is None:
            encoder = BertModel(BertConfig(**config.encoder))
        super().__init__(config, encoder, model_card_data, **kwargs)

with open(r"/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000/config.json") as f:
    data = json.load(f)
try:
    del data["encoder"]["_name_or_path"]
    with open(r"/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000/config.json", "w") as f:
        json.dump(data, f)
except Exception:
    pass

model = BertSpanMarkerModel.from_pretrained("/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000", local_files_only=True, labels = [
'1-EMAIL', '1-ID_NUM', '1-NAME_STUDENT', '1-PHONE_NUM', '1-STREET_ADDRESS',
'1-URL_PERSONAL', '1-USERNAME', '2-ID_NUM', '2-NAME_STUDENT', '2-PHONE_NUM',
'2-STREET_ADDRESS', '2-URL_PERSONAL', 'O'
])

If you're interested in the mistakes, here they are: The model makes two errors when trying to load without internet:

  1. Tries to load the "base encoder" (e.g. bert-base-uncased) config when initializing a trained SpanMarker model; this is necessary to create an "empty" SpanMarker model that can then be filled with the saved weights. It uses the base encoder config to infer the base encoder model class, but we can help it out with the above BertSpanMarkerModel. We know our model is a BertModel, so we use that. (You'd use RobertaModel if you use a Roberta model here, for example)
  2. Tries to load the "base encoder" tokenizer. This is kind of unnecessary; the tokenizer is also included in the saved SpanMarker model, which is the "second option" in my code. We want it to skip the first option (as that one crashes) and go straight to option 2. We can do this by opening our config.json and removing the _name_or_path from the encoder.
justin-donnelly commented 5 months ago

Hi Tom, unfortunately after copy pasting the code, that issue still persists: https://www.kaggle.com/jdonnelly0804/pii-infer

Were you able to make it work on your end?

import json from typing import Optional from transformers import PreTrainedModel, BertModel, BertConfig from span_marker import SpanMarkerModel from span_marker.configuration import SpanMarkerConfig from span_marker.model_card import SpanMarkerModelCardData

class BertSpanMarkerModel(SpanMarkerModel): def init( self, config: SpanMarkerConfig, encoder: Optional[PreTrainedModel] = None, model_card_data: Optional[SpanMarkerModelCardData] = None, kwargs, ) -> None: if encoder is None: encoder = BertModel(BertConfig(config.encoder)) super().init(config, encoder, model_card_data, **kwargs)

with open(r"/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000/config.json") as f: data = json.load(f) try: del data["encoder"]["_name_or_path"] with open(r"/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000/config.json", "w") as f: json.dump(data, f) except Exception: pass

model = BertSpanMarkerModel.from_pretrained("/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000", local_files_only=True, labels = [ '1-EMAIL', '1-ID_NUM', '1-NAME_STUDENT', '1-PHONE_NUM', '1-STREET_ADDRESS', '1-URL_PERSONAL', '1-USERNAME', '2-ID_NUM', '2-NAME_STUDENT', '2-PHONE_NUM', '2-STREET_ADDRESS', '2-URL_PERSONAL', 'O' ])


LocalEntryNotFoundError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:389, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 387 try: 388 # Load from URL or cache if already cached --> 389 resolved_file = hf_hub_download( 390 path_or_repo_id, 391 filename, 392 subfolder=None if len(subfolder) == 0 else subfolder, 393 repo_type=repo_type, 394 revision=revision, 395 cache_dir=cache_dir, 396 user_agent=user_agent, 397 force_download=force_download, 398 proxies=proxies, 399 resume_download=resume_download, 400 token=token, 401 local_files_only=local_files_only, 402 ) 403 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 118 return fn(args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1362, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint) 1361 if local_files_only: -> 1362 raise LocalEntryNotFoundError( 1363 "Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable" 1364 " hf.co look-ups and downloads online, set 'local_files_only' to False." 1365 ) 1366 elif isinstance(head_call_error, RepositoryNotFoundError) or isinstance(head_call_error, GatedRepoError): 1367 # Repo not found => let's raise the actual error

LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

The above exception was the direct cause of the following exception:

OSError Traceback (most recent call last) Cell In[7], line 29 26 except Exception: 27 pass ---> 29 model = BertSpanMarkerModel.from_pretrained("/kaggle/input/pii-train-1-cp3000/Kaggle Checkpoints/checkpoint 3000", local_files_only=True, labels = [ 30 '1-EMAIL', '1-ID_NUM', '1-NAME_STUDENT', '1-PHONE_NUM', '1-STREET_ADDRESS', 31 '1-URL_PERSONAL', '1-USERNAME', '2-ID_NUM', '2-NAME_STUDENT', '2-PHONE_NUM', 32 '2-STREET_ADDRESS', '2-URL_PERSONAL', 'O' 33 ])

File /opt/conda/lib/python3.10/site-packages/span_marker/modeling.py:302, in SpanMarkerModel.from_pretrained(cls, pretrained_model_name_or_path, labels, config, model_card_data, *model_args, *kwargs) 298 model = cls(config, encoder, model_args, kwargs, model_card_data=model_card_data) 300 # Pass the tokenizer directly to the model for convenience, this way the user doesn't have to 301 # make it themselves. --> 302 tokenizer = SpanMarkerTokenizer.from_pretrained( 303 config.encoder.get("_name_or_path", pretrained_model_name_or_path), config=config, kwargs 304 ) 305 model.set_tokenizer(tokenizer) 306 # Since transformers 4.32.0 we should use pad_to_multiple_of=8. 307 # That'll fail for earlier versions, so we try-except it.

File /opt/conda/lib/python3.10/site-packages/span_marker/tokenizer.py:277, in SpanMarkerTokenizer.from_pretrained(cls, pretrained_model_name_or_path, config, *inputs, kwargs) 275 @classmethod 276 def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], *inputs, config=None, *kwargs): --> 277 tokenizer = AutoTokenizer.from_pretrained( 278 pretrained_model_name_or_path, inputs, kwargs, add_prefix_space=True 279 ) 280 # XLM-R is known to have some tokenization issues, so be sure to also split on punctuation. 281 # Strictly required for inference, shouldn't affect training. 282 if isinstance(tokenizer, XLMRobertaTokenizerFast):

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:752, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, kwargs) 750 if config_tokenizer_class is None: 751 if not isinstance(config, PretrainedConfig): --> 752 config = AutoConfig.from_pretrained( 753 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, kwargs 754 ) 755 config_tokenizer_class = config.tokenizer_class 756 if hasattr(config, "auto_map") and "AutoTokenizer" in config.auto_map:

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1082, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, kwargs) 1079 trust_remote_code = kwargs.pop("trust_remote_code", None) 1080 code_revision = kwargs.pop("code_revision", None) -> 1082 config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) 1083 has_remote_code = "auto_map" in config_dict and "AutoConfig" in config_dict["auto_map"] 1084 has_local_code = "model_type" in config_dict and config_dict["model_type"] in CONFIG_MAPPING

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:644, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, kwargs) 642 original_kwargs = copy.deepcopy(kwargs) 643 # Get config dict associated with the base config file --> 644 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) 645 if "_commit_hash" in config_dict: 646 original_kwargs["_commit_hash"] = config_dict["_commit_hash"]

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:699, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 695 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME) 697 try: 698 # Load from local folder or from cache or download from model Hub and cache --> 699 resolved_config_file = cached_file( 700 pretrained_model_name_or_path, 701 configuration_file, 702 cache_dir=cache_dir, 703 force_download=force_download, 704 proxies=proxies, 705 resume_download=resume_download, 706 local_files_only=local_files_only, 707 token=token, 708 user_agent=user_agent, 709 revision=revision, 710 subfolder=subfolder, 711 _commit_hash=commit_hash, 712 ) 713 commit_hash = extract_commit_hash(resolved_config_file, commit_hash) 714 except EnvironmentError: 715 # Raise any environment error raise by cached_file. It will have a helpful error message adapted to 716 # the original exception.

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:429, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs) 427 if not _raise_exceptions_for_missing_entries or not _raise_exceptions_for_connection_errors: 428 return None --> 429 raise EnvironmentError( 430 f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this file, couldn't find it in the" 431 f" cached files and it looks like {path_or_repo_id} is not the path to a directory containing a file named" 432 f" {full_filename}.\nCheckout your internet connection or see how to run the library in offline mode at" 433 " 'https://huggingface.co/docs/transformers/installation#offline-mode style="color:rgb(175,0,0)">'." 434 ) from e 435 except EntryNotFoundError as e: 436 if not _raise_exceptions_for_missing_entries:

OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.