santhoshkolloju / Abstractive-Summarization-With-Transfer-Learning

Abstractive summarisation using Bert as encoder and Transformer Decoder
406 stars 98 forks source link

AttributeError: 'dict' object has no attribute 'src_txt' #9

Closed darienacosta closed 5 years ago

darienacosta commented 5 years ago

Getting a 500 error when using Postman on /results.

/preprocess.py", line 170, in convert_single_example tokens_a = tokenizer.tokenize(example.src_txt) AttributeError: 'dict' object has no attribute 'src_txt'

Is this because I'm using python3?

Vibha111094 commented 5 years ago

I think this happens only during the inference . Try changing it to tokens_a = tokenizer.tokenize(example['src_txt']) This solved the problem for me.

darienacosta commented 5 years ago

Thanks, I think that worked. How long does the inference process take? I'm not seeing an error, but python (after sending postman request via flask) has been processing for 3 hours with no end in sight.

Vibha111094 commented 5 years ago

The inference shoud be very quick. Iyou need to change the reshape parameters as (1,-1) feed_dict = { src_input_ids:np.array(features.src_input_ids).reshape(1,-1), src_segment_ids:np.array(features.src_segment_ids).reshape(1,-1) }

also

labels = np.array(features.tgt_labels).reshape(1,-1)

On Sun, 7 Apr 2019 at 02:59, darienacosta notifications@github.com wrote:

Thanks, I think that worked. How long does the inference process take? I'm not seeing an error, but python (after sending postman request via flask) has been processing for 3 hours with no end in sight.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/santhoshkolloju/Abstractive-Summarization-With-Transfer-Learning/issues/9#issuecomment-480549706, or mute the thread https://github.com/notifications/unsubscribe-auth/AP3_DswghNlDKUjjB8K7ZdcQjZhMHIyXks5veULygaJpZM4cgdV- .

darienacosta commented 5 years ago

Thanks, I've made the suggested changes. The process is no longer hanging, but I see the following error. (I haven't made any other code changes)

ERROR in app: Exception on /results [POST]
Traceback (most recent call last):
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\_compat.py", line 35, in reraise
    raise value
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "inference.py", line 85, in results
    hwords = infer_single_example(story,summary,tokenizer)
  File "inference.py", line 71, in infer_single_example
    references = utils.list_strip_eos(references[0], eos_token_id)
  File "C:\abstractive\texar_repo\examples\transformer\utils\utils.py", line 70,
 in list_strip_eos
    if eos_token in elem:
TypeError: argument of type 'int' is not iterable
Vibha111094 commented 5 years ago

I think you should try replacing references = utils.list_strip_eos(references[0], eos_token_id) with

references = utils.list_strip_eos(references, eos_token_id)

jokebroker commented 5 years ago

I think this happens only during the inference . Try changing it to tokens_a = tokenizer.tokenize(example['src_txt']) This solved the problem for me.

Hi - I was able to call the model using this fix, however no matter how much memory I throw at it (running either on CPU or GPU w/12GB), I get OOM errors. This despite the batch size being 1.

What sort of hardware have you guys successfully go the model running on?

Thanks!

Vibha111094 commented 5 years ago

you need to change the reshape parameters as (1,-1) feed_dict = { src_input_ids:np.array(features.src_input_ids).reshape(1,-1), src_segment_ids:np.array(features.src_segment_ids).reshape(1,-1) }

also

labels = np.array(features.tgt_labels).reshape(1,-1)

jokebroker commented 5 years ago

Ah thanks - thought I had a different issue, but that allows me to call the model properly I also updated references = utils.list_strip_eos(references, eos_token_id)

However now I get a single token/word as the 'summary' no matter the length of my input.

I'm using postman, with Content-Type: multipart/form-data The 'summary' field seems to be interpreted as the story/field to summarize however.

Will keep trying things, but if you have any more tips about how to structure the POST request, or modify the size of the output, would appreciate it - thanks again.

Vibha111094 commented 5 years ago

I would suggest instead of directly using forms ,modify the code such that you just feed summary and story as variable and check . Something like this: with tf.Session() as sess: model_dir = "gs://my_bert_summ/models10/" sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) sess.run(tf.tables_initializer()) saver.restore(sess, tf.train.latest_checkpoint(model_dir)) story = ".......your story" summary = "........your summary" hwords = infer_single_example2(story,summary ,tokenizer,sess) print("") print(hwords) print("/")

darienacosta commented 5 years ago

I'm not having the issues that jokebroker is seeing, but I'm curious how large are the datasets that everyone is training on. Are people actually putting the entire CNN/Dailymail summaries into train_story.txt, train_summ.txt, etc files?

Vibha111094 commented 5 years ago

I am using this . It seems to be working fine:) train - [0:90000] eval - [90000:91579] test - [91579:92579]

angeluau commented 5 years ago

Thanks, I've made the suggested changes. The process is no longer hanging, but I see the following error. (I haven't made any other code changes)

ERROR in app: Exception on /results [POST]
Traceback (most recent call last):
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\_compat.py", line 35, in reraise
    raise value
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\0\AppData\Local\Programs\Python\Python37\lib\site-packages\
flask\app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "inference.py", line 85, in results
    hwords = infer_single_example(story,summary,tokenizer)
  File "inference.py", line 71, in infer_single_example
    references = utils.list_strip_eos(references[0], eos_token_id)
  File "C:\abstractive\texar_repo\examples\transformer\utils\utils.py", line 70,
 in list_strip_eos
    if eos_token in elem:
TypeError: argument of type 'int' is not iterable

i want to know how to config the request "Use postman to send the POST request @http://your_ip_address:1118/results with two form parameters story,summary" i have change to the last row of Inference.py to my address,but when i run this code,it still shows the error" Cannot assign requested address"

santhoshkolloju commented 5 years ago

Send a post request with form parameters (key, value) pairs

angeluau commented 5 years ago

Send a post request with form parameters (key, value) pairs

i cant get your point, where is the pair(key,value)

santhoshkolloju commented 5 years ago

Once you open postman under the form-data tab

angeluau commented 5 years ago

i run this line python Inference.py story=data/eval_story.txt summary=data/eval_summ.txt and here app.run(host="0.0.0.0",port=1118,debug=False) but still have error,why

santhoshkolloju commented 5 years ago

Inference is for one example it doesn't support a text file as input

darienacosta commented 5 years ago

Awesome

dearchill commented 5 years ago

Ah thanks - thought I had a different issue, but that allows me to call the model properly I also updated references = utils.list_strip_eos(references, eos_token_id)

However now I get a single token/word as the 'summary' no matter the length of my input.

I'm using postman, with Content-Type: multipart/form-data The 'summary' field seems to be interpreted as the story/field to summarize however.

Will keep trying things, but if you have any more tips about how to structure the POST request, or modify the size of the output, would appreciate it - thanks again.

I have a similar problem. No matter how long article I fed my "story" parameter, I finally got a single sentence output which definitely related to training story instead of here parameter input story. I set the "summary" parameter with None. Maybe I misunderstand the post request function? If I feed the "summary" parameter too, what's the meaning of returning output? Could anyone help me? Thank you!