Added functions to create entity grids and feature vectors from gold summary training docs
Currently the number of permutations for each document is 11 (including the original order).
This was changed as I was ambitious and did every possible ordering and then ran out of memory.
I will prob change this number to a param so we can test easily
Added stub order_info_entity(topics_with_summaries) to test the Topic summaries on once the model is built
Chronological ordering is still implemented under order_info_chron()
Changed chronological ordering function name from order_info() to order_info_chron()
text_summarizer.py
imported 3 new functions from info_ordering: order_info_chron, order_info_entity, get_training_vectors
main()
Read in gold summary document data and returned a list of document objects
Generates entity grids and feature representations for training data
Call to stub function order_info_entity(topics_with_summaries)- Commented out and will fix once model is built
Changed chronological ordering function name from order_info() to order_info_chron()
I realize there is/might be a bug in this branch where the sentences in the output files are being split and put onto new lines. Too tired to look into that right now.
tldr
Training vectors are good to go for the model. Waiting on model to be built to test the ordering of our Topic summaries.
info_ordering.py
text_summarizer.py
main()
I realize there is/might be a bug in this branch where the sentences in the output files are being split and put onto new lines. Too tired to look into that right now.
tldr Training vectors are good to go for the model. Waiting on model to be built to test the ordering of our Topic summaries.