jxmorris12 / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text
Other
673 stars 75 forks source link

Question about storage path #45

Closed sun1187 closed 4 months ago

sun1187 commented 4 months ago

Hi, @jxmorris12. Thanks for the great work and sharing code.

  1. I wonder where does the results of your code(data, tokenized results, embedding values, ...) are stored.
  2. And if you want to change the storage path, which line in the code should you modify?

    While reproducing results from the paper, I got

    No space left on device

issue on experiments.py line 403 (def _load_train_dataset_uncached)

for key in raw_datasets:
            raw_datasets[key] = dataset_map_multi_worker(
                dataset=raw_datasets[key],
                map_fn=tokenize_fn(
                    tokenizer,
                    embedder_tokenizer,
...
            )

So I tried to change 'DATASET_CACHE_PATH' in utils.py and experiments.py as below.

DATASET_CACHE_PATH = os.environ.get(
    # original: "VEC2TEXT_CACHE", os.path.expanduser("~/.cache/inversion") 
    "VEC2TEXT_CACHE", os.path.expanduser("target_path")
)

However, for some reason, the tokenized results are not stored in 'target_path'; they are still stacked in '.cache/inference'

Is it correct that all the data and embedding values... are stored in the .cache/inversion folder? If so, are there any additional modifications that need to be made in order to store them in the path I specified?

Thanks again!

jxmorris12 commented 4 months ago

This is an environment variable. You can set VEC2TEXT_CACHE=target_path in your shell and then run the program and it'll have the result you're expecting. You can also look at the os.environ documentation to learn more: https://docs.python.org/3/library/os.html#os.environ