stephanlensky / hyacinth

A Discord bot to send notifications for marketplace (Craigslist, Facebook) postings based on complex matching rules.
https://slensky.com/hyacinth
GNU Affero General Public License v3.0
45 stars 8 forks source link

Inconsistent and unpredictable grabbing and reporting #70

Open Dshadows33 opened 7 months ago

Dshadows33 commented 7 months ago

Log.txt I started using this last night and I'm noticing odd behavior where it will report things in batches every 30-40 minutes, and it will only find Facebook listing once after a few polls returning nothing. An example would be a batch sent to my discord at 10:41 and it included listings posted on Facebook at 9:34, 9:40 and 9:45. The next batch it sent was a single listing sent at 11:14 that was posted on Facebook at 10:32. I've watched it running in docker for a couple of hours and I haven't found anything out of the ordinary, just the odd behavior above. I've also went about and set interval poll to marketplace_poll_interval_seconds: int = 900 and it made no difference. Any suggestions? I've attatched what I have been seeing in Docker.

stephanlensky commented 6 months ago

Hey, taking a look now. Thanks for providing the log.

I'm looking at that example where the batch was sent at 10:41 (17:41 UTC in the logs). It looks like a search ran around that time and found the 3 listings you mentioned. That search was running every 10 minutes though, so it should have definitely found those listings much earlier.

There's two possibilities for what happened here:

  1. Those listings were actually just totally missing from the search results until around 17:41. Not sure why Facebook would delay showing the listings like this, but if that's what's happening there's not much I can do
  2. The listings appeared in the search results, but out of order (so an older listing was shown first). Hyacinth relies on the listings being in order so that it knows when to stop the search, so if the listings were being shown out of order that could explain this.

To investigate further, I'd need to see the search results page from a previous run of the bot (e.g. the 17:30 run of the search which was supposed to collect results since 16:30 but missed the 16:34, 16:40, and 16:45 listings)

2024-02-17 10:30:56 service-1      | 2024-02-17 17:30:56 [9] [DEBUG] hyacinth.monitor Polling search SearchSpec(id=1, plugin_path=plugins.marketplace.plugin:MarketplacePlugin, search_params=location='106066949424984' category='video-games-consoles') since 2024-02-17 16:30:56.753762+00:00
...
2024-02-17 10:31:05 service-1      | 2024-02-17 17:31:05 [9] [DEBUG] hyacinth.monitor Found 0 since 2024-02-17 16:30:56.753762+00:00 for search_spec=SearchSpec(id=1, plugin_path=plugins.marketplace.plugin:MarketplacePlugin, search_params=location='106066949424984' category='video-games-consoles')

I can't easily reproduce the issue, but if you'd like to help with the investigation I've added a new option which you can set in your .env file after pulling the latest version of the code:

HYACINTH_SAVE_SCRAPED_PAGES=true

With this option enabled, every page that is scraped will be saved to the logs/ folder. If you see something similar again, you can go back to the saved page from a previous search which should have had the listing, and then see if the listing is actually there or not.

Make sure to turn it off when you're done, as leaving this option enabled may quickly fill up your disk space.

Dshadows33 commented 6 months ago

I'd be happy to help! Sorry for the delay.

logs.zip

I have the zip attached with all the data from 2024-03-04 23:02:13 to 2024-03-04 23:39:08

It looks to be a little bit of Facebook delaying listings from being viewed, and some other error as well.

We can see at 16 minutes 46 seconds both the steam deck and game boy come into view, and in the next two images we see the steam deck was posted 5 minutes ago, but the game boy posted 17 minutes ago. This game boy is not present on the image given at 6 minutes 44 seconds. Both the steam deck and the game boy were successfully recorded and sent to my discord when they were discovered. Facebook is just being weird in this case.

But there does seem to be an alternative issue with Watchdogs showing up on image 16 minutes 12 seconds and the following image, but not on the previous search shown at 6 minutes 12 seconds. Watchdogs continues to show up on following searches but does not get reported to my Discord.

It's now 12:03 (45 minutes since watchdogs first appeared) and I still haven't had it reported to my discord, but still being recorded on my logs.

Dshadows33 commented 6 months ago

It's worse now. I'll send you new data later today.

Dshadows33 commented 3 months ago

New issue. I'm now not getting notified of anything. [Uploading poll_failure_2024-05-25T083141.914938.txt…]()