hyPnOtICDo0g / rss-chan

A telegram RSS feed reader bot, written using python and feedparser.
GNU General Public License v3.0
102 stars 77 forks source link

[Bugfix] Alter RSS monitor algorithm to support quickly updating feed items #11

Closed anasty17 closed 2 years ago

anasty17 commented 2 years ago

https://github.com/hyPnOtICDo0g/rss-chan/blob/7179de62377f2d3ce8258f47ef555b03c2200fc1/bot/modules/rsshandler.py#L204

What if rss only provide 25 item .. and all of them are new ... if someone added delay 20 min for example and in those 20 min, a 25 new items added ... the while loop will not stop until maching last previous link and title .. and this will not happen bcz all items are new... index error will raised cz u trying to get feed with index 26 and the max item no. provided from this rss is 25.

hyPnOtICDo0g commented 2 years ago

@anasty17

Yes, theoretically this could happen.

I've tested 30 different feed URLs with 10, 20 & 30 minutes of delay and none of them seem to reproduce this issue. This is an edge case which occurs only when delay is set to a very high number and the feed items update very quickly. If you can provide a feed URL to reproduce this issue then I'll be happy to look into it.

anasty17 commented 2 years ago

I was testing also, i have implemented rss to mirror leech repo based on urs ... someone reported to me while using this rss: https://rarbg.to/rssdd_magnet.php . Index error raised with 15 min delay

anasty17 commented 2 years ago

Maybe this rss get all rarbg updates .. when uploaders add new torrents,all 25 items will changed and i guess more also... maybe u will say why people would use this rss ... i say maybe other rss with 50 item from very active site uploaders with delay 20 min as many use high delay to avoid flood and high load while using mirror bot

hyPnOtICDo0g commented 2 years ago

@anasty17

Looks like this site in particular is a pain to deal with. Here's my analysis:

Consider this issue to be of low priority. It will be fixed eventually.

But, If there's a flooding issue with the current implementation of your bot, then you should be trying to fix the high load by delaying the time interval internally between multiple auto-mirror calls, rather than opening an issue here.

My implementation of RSS feed fetching has no relation with your bot going haywire.

anasty17 commented 2 years ago

Yeah rarbg block ip if delay is low and this can be only the one reason of reporting also since maybe other rss need can block the ip also and it show low item no.

1st sure its no relation, but this repo can be used for different purposes!

2nd i will not open waiting threads this will lead to more load!

3rd i have already fixed it!

4th it's a bug theoretically or not bug is a bug, this already happened with many different users and not only with this rss but I took time to understand what was the problem!

Sorry for opening issue here, I was only trying to help.

Take a look on my implementation and learn some stuff 😉

hyPnOtICDo0g commented 2 years ago

@anasty17

I took a look at your implementation, and well, it took me a long to time to read your code. Please include code comments so that it helps others understand your code.

I know that everyone has different styles of programming, but I'm not sure why you require multiple objects, loops & conditional statements to solve this issue. It can be done using a simple try except block.

try:
  while(feed_items[1] != rss_d.entries[feed_count]['link'] and feed_items[2] != rss_d.entries[feed_count]['title']):
    feed_list.insert(0, f'{CUSTOM_MESSAGES}\n'+utilities.format_items(rss_d, feed_count, feed_items[3])[1])
    feed_count += 1
except IndexError:
    LOGGER.error(f"There were a couple of feed items skipped for this feed: {feed_items[0]}")
for feed in feed_list:
    context.bot.send_message(CHAT_ID, feed, parse_mode='html', disable_web_page_preview=True)    

The above code is pretty self-explanatory I hope. RSS is a way of fetching updates based on a certain delay between each request. If the updates are faster than the request intervals or there's an IP ban involved due to quicker requests, there's nothing much we can do about it. You're bound to lose out on a couple of updates. Only thing you can assure is to not skip the updates which come later on.

Edit: The above code has been tested and it works fine.

anasty17 commented 2 years ago

I guess u didn't get what i did, filter added btw .. also I want separated logger for each case... if ip blocked or feedcount out of range will get same log error... and why saving them in list if i can send them directly

anasty17 commented 2 years ago

@anasty17

I took a look at your implementation, and well, it took me a long to time to read your code. Please include code comments so that it helps others understand your code.

I know that everyone has different styles of programming, but I'm not sure why you require multiple objects, loops & conditional statements to solve this issue. It can be done using a simple try except block.

try:
  while(feed_items[1] != rss_d.entries[feed_count]['link'] and feed_items[2] != rss_d.entries[feed_count]['title']):
    feed_list.insert(0, f'{CUSTOM_MESSAGES}\n'+utilities.format_items(rss_d, feed_count, feed_items[3])[1])
    feed_count += 1
except IndexError:
    LOGGER.error(f"There were a couple of feed items skipped for this feed: {feed_items[0]}")
for feed in feed_list:
    context.bot.send_message(CHAT_ID, feed, parse_mode='html', disable_web_page_preview=True)    

The above code is pretty self-explanatory I hope. RSS is a way of fetching updates based on a certain delay between each request. If the updates are faster than the request intervals or there's an IP ban involved due to quicker requests, there's nothing much we can do about it. You're bound to lose out on a couple of updates. Only thing you can assure is to not skip the updates which come later on.

Edit: The above code has been tested and it works fine.

No need for testing btw need try and except with specific log to make it clear for user, if ip blocked or feedcount out of range

hyPnOtICDo0g commented 2 years ago

@anasty17

if ip blocked or feedcount out of range will get same log error

No, they're different blocks.

why saving them in list if i can send them directly

Code looks cleaner. It's not like I can't send them directly, I want people who look at my code to understand what exactly is going on.

Edit: This will be fixed in v1.2.

anasty17 commented 2 years ago

@anasty17

if ip blocked or feedcount out of range will get same log error

No, they're different blocks.

  • When items are skipped you're gonna receive this error: LOGGER.error(f"There were a couple of feed items skipped for this feed: {feed_items[0]}")
  • When there's an ip block, you gonna receive this error: LOGGER.error(f"There was an error while parsing this feed: {feed_items[0]}")

why saving them in list if i can send them directly

Code looks cleaner. It's not like I can't send them directly, I want people who look at my code to understand what exactly is going on.

Edit: This will be fixed in v1.2.

But incase any other error in data in while loop, like title or link (this could happen).

The error will raised to same exception, I don't want this. In case i don't want them in list.

anasty17 commented 2 years ago

And anw I didn't open an issue to make this conversation. I was trying to help only but u didn't accept it. Its ok I have done many bugs in my repo and many has been reported and I'm thankful for who report they helped me alot.

hyPnOtICDo0g commented 2 years ago

@anasty17

I was trying to help only but u didn't accept it

No, this was an issue and thanks for reporting it!

But we do have different implementations of the same function and I just like my code to be a bit compact. This definitely will help me handle errors better the next time I implement something, so thank you.

anasty17 commented 2 years ago

@anasty17

I was trying to help only but u didn't accept it

No, this was an issue and thanks for reporting it!

But we do have different implementations of the same function and I just like my code to be a bit compact. This definitely will help me handle errors better the next time I implement something, so thank you.

You welcome :)