pkouris / abtextsum

Abstractive text summarization based on deep learning and semantic content generalization
17 stars 4 forks source link

About Bangla dataset #3

Open PrithwirajRizu opened 4 years ago

PrithwirajRizu commented 4 years ago

Is this compatible with any kind of language? (In my case: Bangla) I have a Bengali summarization dataset. One article.txt file and one summary.txt file of those corresponding articles. If this is possible then how can I run the model using my own dataset?? Thanks in advance.

pkouris commented 4 years ago

Since this source code was used for our study (see paper), it uses particular filenames and a particular structure of files and directories in the data directory. For training the deep learning model using your files, you should make some changes in the code. In particular, you could follow the following steps:

  1. In dataset_path.py add the path to the directory of your dataset (directory of article and summary files).

  2. In paths.py, (a) add word embeddings file path to line 8. If you do not have word embeedings file, see below. (b) After line 18, add the following lines: train_mydata_article_file_path = train_dir + 'train_mydata_article.txt'
    train_mydata_title_file_path=train_dir+'train_mydata_title.txt' After line 68, add: validation_mydata_article_file_path = validation_dir + 'validation_mydata_articles.txt' validation_mydata_title_file_path = validation_dir + 'validation_mydata_titles.txt' After line 116, add: test_mydata_article_file_path = test_dir + 'test_mydata_articles.txt' test_mydata_title_file_path = test_dir + 'test_mydata_titles.txt' (c) Rename your filenames (in dataset directory) to train_mydata_article.txt and train_mydata_title.txt, respectivelly.

  3. In parameters.py, (a) add 'mydata': 'mydata' to the dictionary model_name_dict (line 12) (i.e., your model name is 'mydata'). (b) You can set the values of hyper-parameters accordingly and if you don't have any word embeddings, you can change the value of parameter using_word2vec_embeddings to False (line 68)

  4. For building training, validation, and testing set, run the commands: python build_dataset.py -mode train -model mydata python build_dataset.py -mode validation -model mydata python build_dataset.py -mode test -model mydata

  5. For training run: python train.py -model mydata

For more details, read the file README.md I hope that the above will help you.

pkouris commented 4 years ago

About compatibility, I think that it is compatible, but you should test it.

PrithwirajRizu commented 4 years ago

OK i am trying to run it following above description. I will let you know the result as early as possible.

PrithwirajRizu commented 4 years ago

@pkouris after running python train.py -model mydata only the first two line of Class Train from train.py file, which are parser = argparse.ArgumentParser and parser.add_argument('-model', default="", help="\nmodel values: lg100d5, lg200d5, neg100, neg200, mydata...\n" "e.g. python train.py -model neg100") executed and rest of the code is not running or skipped to run. Is there any function call required for further executing the process after these two line of code?

PrithwirajRizu commented 4 years ago

For debugging purpose I wrote print("debug") in each of the functions of train.py file whether they are called or not but the print is not executing in any of them.

pkouris commented 4 years ago

@pkouris after running python train.py -model mydata only the first two line of Class Train from train.py file, which are parser = argparse.ArgumentParser and parser.add_argument('-model', default="", help="\nmodel values: lg100d5, lg200d5, neg100, neg200, mydata...\n" "e.g. python train.py -model neg100") executed and rest of the code is not running or skipped to run. Is there any function call required for further executing the process after these two line of code?

You should write here the error message that is returned.

PrithwirajRizu commented 4 years ago

Screenshot from 2020-02-06 19-50-43

Actually no error message is shown in the terminal. after running this command it just stopped within 2 seconds without executing anything. I didn't even touch the code of train.py file. Just for debugging purpose I tried to print in every function as mentioned before but not working.

pkouris commented 4 years ago

Add the following two lines at the end of the file train.py and out of the class Train (i.e. without blank space left of the first line).

if __name__ == "__main__":
    Train()
pkouris commented 4 years ago

@PrithwirajRizu, also I have updated the source code with the above two lines of code.