summanlp / textrank

TextRank implementation for Python 3.
https://pypi.org/project/summa/
MIT License
1.25k stars 260 forks source link

Ratio and Words dont work properly when provided as a variable #81

Open MarlNox opened 4 years ago

MarlNox commented 4 years ago

I'm building a flask app, where the user defines the ratio or the number of words to be given as output from the summarizer. The input text is stripped of text artifacts, then recomposed and fed to the the summarizer. Even though the values pass to the app from the frontpage, ratio and word number do not seem to be functioning. The output text always comes up very short and does not change in length when I change the ratio. I've tried using gensim aswell, with the same results. Any suggestions?

fbarrios commented 4 years ago

Hi! Can you share the example text and how you are calling the library? If you cannot share the text here you can also email it personally.

MarlNox commented 4 years ago

Hi! Can you share the example text and how you are calling the library? If you cannot share the text here you can also email it personally.

Hi fbarrios, To be more descriptive, the issue has happened with any type of text ive input, it just provides one or two sentences as a summary. Also, I've experimented with different methods to clean the code with the same results. For more info check the code below:

@app.route("/summarize", methods=["GET", "POST"])
def summarize():
    text1 = request.form['text']
    percent = request.form['percentage']
    numri = request.form['numberOfWords']
    if numri == 0:
        nr1 = int(numri)
        texty = str(text1)
        textu = re.sub(r'\n\s*\n', '\n', texty, flags=re.MULTILINE)
        b_list = textu.split()
        text = " ".join(b_list)
        sent = nltk.sent_tokenize(text)
        if len(sent) < 2:
            summary1 = "please pass more than 3 sentences to summarize the text"
        else:
            summary = summy(text, words=nr1)
            summ = nltk.sent_tokenize(summary)
            summary1 = (" ".join(summ[:2]))
            result = {
                "result": summary1
            }
            result = {str(key): value for key, value in result.items()}
            return jsonify(result=result)
    else:
        nr = float(percent)
        texty = str(text1)
        textu = re.sub(r'\n\s*\n', '\n', texty, flags=re.MULTILINE)
        b_list = textu.split()
        text = " ".join(b_list)
        sent = nltk.sent_tokenize(text)
        if len(sent) < 2:
            summary1 = "please pass more than 3 sentences to summarize the text"
        else:
            print(nr)
            summary = summy(text, ratio=nr)
            summ = nltk.sent_tokenize(summary)
            summary1 = (" ".join(summ[:2]))
            result = {
                "result": summary1
            }
            print(result)
            result = {str(key): value for key, value in result.items()}
            return jsonify(result=result)

Do you think it's related to txt formatting, an internal bug that may be caused by invoking the app within flask, or something else? I'm able to confirm that the accurately sent to the backend, so I'm note sure. If you think it's a formatting issue, any suggestions on effective ways to clean up the text?

Thanks,