n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
284 stars 56 forks source link

File exists but it doesn't found it!!!!! #78

Open puraminy opened 4 years ago

puraminy commented 4 years ago

When I execute a python script via jupyter notebook I recieve the following error:


    ~/miniconda3/lib/python3.7/site-packages/fastai/text/data.py in train_sentencepiece(texts, path, pre_rules, post_rules, vocab_sz, max_vocab_sz, model_type, max_sentence_len, lang, char_coverage, tmp_dir, enc)
        434         f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
        435         f"--user_defined_symbols={','.join(spec_tokens)}",
    --> 436         f"--model_prefix={quotemark}{cache_dir/'spm'}{quotemark} --vocab_size={vocab_sz} --model_type={model_type}"]))
        437     raw_text_path.unlink()
        438     return cache_dir

    OSError: Not found: ""/home/pouramini/mf1/data/wiki/fa-2/models/fsp15k/all_text.out"": No such file or directory Error #2

However, the file exists! I wonder why it shows the path in two double quote?!

This is the code where the error raises, it looks for raw_text_path:

    raw_text_path = cache_dir + '/all_text.out'
            with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
            spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
            SentencePieceTrainer.Train(" ".join([
                f'--input={raw_text_path} --max_sentence_length={max_sentence_len}',
                f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
                f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
                f"--user_defined_symbols={','.join(spec_tokens)}",
                f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
            raw_text_path.unlink()
puraminy commented 4 years ago

The problem is related to fastai or sentencepiece versions... What happens instead is that 'tmp' folder is created along with files named "cache_dir".vocab and "cache_dir".model inside my current directory.

For a solution you can refer to :

https://stackoverflow.com/questions/59788395/fastai-failed-initiation-of-language-model-in-sentence-piece-processor-cache?noredirect=1#comment110726963_59788395