worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
https://worldbank.github.io/REaLTabFormer/
MIT License
212 stars 24 forks source link

rtf_checkpoints bug when fitting the GeoValidator example model #61

Open martinjurkovic opened 9 months ago

martinjurkovic commented 9 months ago

Running the REaLTabFormer_GeoValidator_Example.ipynb on Google Colab results in the following error during the rtf_model.fit(data, num_bootstrap=10):

/usr/local/lib/python3.10/dist-packages/realtabformer/realtabformer.py:834: UserWarning: No best model was saved. Loading the closest model to the sensitivity_threshold.
  warnings.warn(
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
[<ipython-input-9-d11ec56195b9>](https://localhost:8080/#) in <cell line: 2>()
      1 hf_logging.set_verbosity_error()
----> 2 rtf_model.fit(data, num_bootstrap=10)

6 frames
[/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py](https://localhost:8080/#) in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    399         if not os.path.isfile(resolved_file):
    400             if _raise_exceptions_for_missing_entries:
--> 401                 raise EnvironmentError(
    402                     f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout "
    403                     f"'[https://huggingface.co/{path_or_repo_id}/{revision}'](https://huggingface.co/%7Bpath_or_repo_id%7D/%7Brevision%7D') for available files."

OSError: rtf_checkpoints/not-best-disc-model does not appear to have a file named config.json. Checkout 'https://huggingface.co/rtf_checkpoints/not-best-disc-model/main' for available files.

I get the same error when fitting a different dataset on my local machine (MAC OS, python 3.10.13).

The problem seems to be the latest version of the transformers library, since reverting to transformers==4.24.0 (from pyproject.toml) fixes the problem.