pkouris / abtextsum

Abstractive text summarization based on deep learning and semantic content generalization
17 stars 4 forks source link

i can't find any method to #2

Open 108527307 opened 4 years ago

108527307 commented 4 years ago

Traceback (most recent call last): File "build_dataset.py", line 518, in BuildDataset() File "build_dataset.py", line 39, in init param.lines_per_chunk, param.read_lines) File "build_dataset.py", line 136, in build_whole_train_dataset_and_dictionary num_of_lines = self.num_of_lines_of_file(file=train_article_path) File "build_dataset.py", line 77, in num_of_lines_of_file with open(file, 'r', encoding='utf8') as f: FileNotFoundError: [Errno 2] No such file or directory: 'path/to/datasets/directorytrain/train_lg100d5gigaword_article.txt'

when i start to run the 'python build_dataset.py -mode train -model lg100d5g'

pkouris commented 4 years ago

The file train_lg100d5gigaword_article.txt is the file of articles of the Gigaword dataset that has been generalized by using the LG generalization strategy (see the paper). By running the above command with argument -mode lg100d5, the process requires to take as input the files train_lg100d5gigaword_article.txt and train_lg100d5gigaword_title.txt, because they were used in the experimental procedure of the paper (see readme.md file which describes how to use the generalization strategies).

Regardless of these specific files, you could build the dataset and train a model by using any pair of files with articles and summaries, respectively. You should slightly change the files build_dataset.py and paths.py to use any pair of files with articles and summaries as input files.

For more information see the readme.md file of this repository.

I understand that these specific filenames (which was used in the experimental procedure of the paper) restrict the framework of using in more general applications. We aim to simplify the code to be more general for using any dataset filename without restrictions. We will do that as soon as possible.

108527307 commented 4 years ago

Thank you very much. I have ran it.

2019年10月16日 下午9:04,Panagiotis Kouris notifications@github.com 写道:

The file train_lg100d5gigaword_article.txt is the file of articles of the Gigaword dataset that has been generalized by using the LG generalization strategy (see the paper). By running the above command with argument -mode lg100d5, the process requires to take as input the files train_lg100d5gigaword_article.txt and train_lg100d5gigaword_title.txt, because they were used in the experimental procedure of the paper (see readme.md file which describes how to use the generalization strategies).

Regardless of these specific files, you could build the dataset and train a model by using any pair of files with articles and summaries, respectively. You should slightly change the files build_dataset.py and paths.py to use any pair of files with articles and summaries as input files.

For more information see the readme.md file of this repository.

I understand that these specific filenames (which was used in the experimental procedure of the paper) restrict the framework of using in more general applications. We aim to simplify the code to be more general for using any dataset filename without restrictions. We will do that as soon as possible.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pkouris/abtextsum/issues/2?email_source=notifications&email_token=ADI4EESFMUOFPHS4MWSMLKDQO4GNHA5CNFSM4JBJYTB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMMVKQ#issuecomment-542689962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADI4EESKZYCF6RBI3EANOW3QO4GNHANCNFSM4JBJYTBQ.