rgcottrell / pytorch-human-performance-gec

A PyTorch implementation of "Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study"
Apache License 2.0
50 stars 19 forks source link

AssertionError: Index file doesn't match expected format. Make sure that --dataset-impl is configured properly. #5

Closed xiaoshengjun closed 4 years ago

xiaoshengjun commented 5 years ago

hi, I run the code 'train-lang8-cnn.bat' in linux, and I have changed ig from 'bat' to 'sh'. The previous step is all ok. and I traind the model 35 epoches, but when I run this code is wrong. Could you help me ? Thank you very much. Namespace(beam=5, cpu=False, criterion='cross_entropy', data='../data-bin/lang-8-fairseq', dataset_impl='cached', diverse_beam_groups=-1, diverse_beam_strength=0.5, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter=500, lang_model_data=None, lang_model_path=None, lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, n=4, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=0, optimizer='nag', path='../checkpoints/lang-8-fairseq-cnn/checkpoint_best.pt', prefix_size=0, print_alignment=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, sent=False, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation', tbmf_wrapper=False, temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, warmup_updates=0, weight_decay=0.0) | [en] dictionary: 137960 types | [gec] dictionary: 121816 types Traceback (most recent call last): File "./generate.py", line 236, in main(args) File "./generate.py", line 37, in main task.load_dataset(args.gen_subset) File "/root/anaconda3/lib/python3.7/site-packages/fairseq/tasks/translation.py", line 188, in load_dataset max_target_positions=self.args.max_target_positions, File "/root/anaconda3/lib/python3.7/site-packages/fairseq/tasks/translation.py", line 51, in load_langpair_dataset fix_lua_indexing=True, dictionary=src_dict)) File "/root/anaconda3/lib/python3.7/site-packages/fairseq/data/indexed_dataset.py", line 39, in make_dataset return IndexedCachedDataset(path, fix_lua_indexing=fix_lua_indexing) File "/root/anaconda3/lib/python3.7/site-packages/fairseq/data/indexed_dataset.py", line 165, in init super().init(path, fix_lua_indexing=fix_lua_indexing) File "/root/anaconda3/lib/python3.7/site-packages/fairseq/data/indexed_dataset.py", line 100, in init self.read_index(path) File "/root/anaconda3/lib/python3.7/site-packages/fairseq/data/indexed_dataset.py", line 106, in read_index 'Index file doesn\'t match expected format. ' AssertionError: Index file doesn't match expected format. Make sure that --dataset-impl is configured properly.

jtynkkynen commented 4 years ago

Hi. Did you manage to solve this issue?

xiaoshengjun commented 4 years ago

Hi. Did you manage to solve this issue?

no。。。

tianfeichen commented 4 years ago

We haven't try this on Linux so no idea on this...