taoshen58 / ReSAN

Apache License 2.0
27 stars 6 forks source link

Reinforced Self-Attention Network (ReSAN)

News: A time- and memory-efficient self-attention mechanism named as Fast-DiSA has been proposed, which is as fast as multi-head self-attention but uses the multi-dim and positional masks techniques. The codes are released at here

Requirements

Foundation

Other Python Packages


Contents of this Repository

  1. Standalone codes for Reinforced Sequence Sampling (RSS), Reinforced Self-Attention (ReSA) and Reinforced Self-Attention Network (ReSAN), which are detailed in the paper. --- dir: resan
  2. Projects codes for the experiments on the Stanford Natural Language Inference (SNLI) Dataset. --- dir: SNLI_rl_pub
  3. Projects codes for the experiments on the Sentences Involving Compositional Knowledge (SICK) Dataset. --- dir: SICK_rl_pub

For the first one, we will give APIs to introduce how to use it at the rest of this file. For others, please enter corresponding directory for further instructions.


APIs

Reinforced Sequence Sampling (RSS)

from resan.rl_nn import generate_mask_with_rl
Parameters:
Returns:

Reinforced Self-Attention (ReSA)

from resan.resa import reinforced_self_attention
Parameters:
Returns:

a tensorflow tensor with shape (batch_size, seq_len, 2*hn) to denotes the context-aware representations for all tokens.

Reinforced Self-Attention Network (ReSAN)

from resan.resan import reinforced_self_attention_network
Parameters:

same as reinforced_self_attention introduced above

Returns:

a tensorflow matrix with shape (batch_size, 2*hn) to denotes the all sequence/sentence encodings



Programming Framework for SNLI or SICK

We first demonstrate the file directory tree of all these projects:

ROOT
--dataset[d]
----glove[d]
----$task_dataset_name$[d]
--pretrained_model [d]
--src[d]
----model[d]
------template.py[f]
------$model_name$.py[f]
----nn_utils[d]
----utils[d]
------file.py[f]
------nlp.py[f]
------record_log.py[f]
------time_counter.py[f]
----dataset.py[f]
----evaluator.py[f]
----graph_handler.py[f]
----perform_recorder.py[f]
--result[d]
----processed_data[d]
----model[d]
------$model_specific_dir$[d]
--------ckpt[d]
--------log_files[d]
--------summary[d]
--------answer[d]
--configs.py[f]
--$task$_main.py[f]
--$task$_log_analysis.py[f]

Note: The result dir will appear after the first running.

We elaborate on the every files[f] and directory[d] as follows:

./configs.py: perform the parameters parsing and definitions and declarations of global variables, e.g., parameter definition/default value, name(of train/dev/test_data, model, processed_data, ckpt etc.) definitions, directories(of data, result, $model_specific_dir$ etc.) definitions and corresponding paths generation.

./$task$_main.py: this is the main entry python script to run the project;

./$task$_log_analysis.py: this provides a function to analyze the log file of training process.

./dataset/: this is the directory including datasets for current project.

./pretrained_model/: this is the directory including the default pretrained supervised learning network checkpoints for corresponding base name.

./src: dir including python scripts

./result/: a dir to place the results.

Parameters in config.py



Note that due to too many codes this repo includes, it is inevitable that there are some wrong when I organize the projects into this repo. If you confront some bugs or errors when running the codes, please feel free to report them by opening a issues. I will reply it ASAP.

Acknowledge