miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.51k stars 529 forks source link

Console being spammed when using library. #198

Closed Meatfucker closed 11 months ago

Meatfucker commented 11 months ago

Using this library in my own project, it spamming things to the console I do not want. Did a bit of digging and it seems to be coming from this loop in https://github.com/miso-belica/sumy/blob/5fdfae543b01359a3cd82b1edb7d5f6c1c89c782/sumy/evaluation/__main__.py#L167-L172

Pretty sure its that loop and print doing it. If this could be removed or made configurable that would be great, thanks. Tested locally and commenting out that loop solves it for me.

miso-belica commented 11 months ago

Hello, this file is a CLI script and it is its purpose to print results into console. If you use sumy as a library I recommend to not use anything from the __main__.py files. You can always use only the functions and control the output yourself.

Meatfucker commented 11 months ago

I am getting these results printed from the console while using it as a library using the example code. This is the code Im using.

from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

parser = HtmlParser.from_url(url, Tokenizer("english")) 
stemmer = Stemmer("english")
summarizer = Summarizer(stemmer)
summarizer.stop_words = get_stop_words("english") #sumy summarizer setup stuff
compileddescription = ""
for sentence in summarizer(parser.document, 4):
     compileddescription = (f' {compileddescription} {sentence}')
     sitedescription = (f'The URL is a website about the following:{compileddescription}')`
miso-belica commented 11 months ago

That's really weird. What is the name of the file your code is written in and how do you run it? Maybe that's the problem. The above code should never produce any console output.

Meatfucker commented 11 months ago

The project im using it in can be found at https://github.com/Meatfucker/metatron in metatron.py in the function extract_text_from_url.

miso-belica commented 11 months ago

@Meatfucker I am really sorry but I have no idea why this happens. If you debug it and find the cause for it I am all ears. The code in https://github.com/Meatfucker/metatron/blob/215026e88671c84b7094da2c10497d7d5e96b186/metatron.py#L230-L238 should not print anything into console or stderr.