stephanlensky / hyacinth

A Discord bot to send notifications for marketplace (Craigslist, Facebook) postings based on complex matching rules.
https://slensky.com/hyacinth
GNU Affero General Public License v3.0
45 stars 8 forks source link

Craigslist error #51

Closed KenwoodFox closed 9 months ago

KenwoodFox commented 9 months ago

Looks like there might be something wrong with the craigslist portion? I think i configured everything correctly but it throws an error and has not yet posted anything.

Let me know what other information might be helpful!

...
2023-11-28 14:18:56 [9] [INFO] hyacinth.monitor Scheduling job for new search! SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
2023-11-28 14:18:56 [9] [DEBUG] hyacinth.monitor Polling search SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca') since 2023-11-28 13:18:56.526773+00:00
2023-11-28 14:18:56 [9] [DEBUG] hyacinth.util.scraping Getting page content for https://nh.craigslist.org/search/mca#search=1~gallery~0~0
2023-11-28 14:19:03 [9] [ERROR] hyacinth.monitor Error polling search SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
Traceback (most recent call last):
  File "/app/plugins/craigslist/client.py", line 100, in _parse_search_results
    has_next_page = num_results[1] != num_results[2]
                    ~~~~~~~~~~~^^^
IndexError: list index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/hyacinth/monitor.py", line 95, in __safe_poll_search
    listings = await search_spec.plugin.get_listings(search_spec.search_params, after_time)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/plugin.py", line 38, in get_listings
    return await get_listings(search_params, after_time, limit)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/client.py", line 32, in get_listings
    async for listing in search:
  File "/app/plugins/craigslist/client.py", line 59, in _search
    has_next_page, parsed_search_results = _parse_search_results(search_results_content)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/client.py", line 104, in _parse_search_results
    raise ParseError("Error parsing search results", content) from e
hyacinth.exceptions.ParseError: Error parsing search results
2023-11-28 14:19:03 [9] [INFO] hyacinth.util.crash_report Saving error report to logs/poll_failure_2023-11-28T14:19:03.011641.txt
2023-11-28 14:19:03 [9] [DEBUG] hyacinth.monitor Found 0 since 2023-11-28 13:18:56.526773+00:00 for search_spec=SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
...
KenwoodFox commented 9 months ago

Seems to be happening here https://github.com/stephanlensky/hyacinth/blob/6f616f4806ad3b5349a575d27c2f70b220b05329/plugins/craigslist/client.py#L98-L100C25

But if i go to the page, i see that field in the debugger...

image

Hm..

KenwoodFox commented 9 months ago

image

Otherwise everything looks good! Im excited to see when it will post the first thing.. so far.. nothing yet

stephanlensky commented 9 months ago

πŸ‘‹ Hi @KenwoodFox, thanks for checking this project out & for the report!

Looks like the Craigslist integration was indeed broken, they pushed an update to the site so that it now requires client-side JS rendering. That's why it appeared normal in your browser (since the JS had already run), but did not work in Hyacinth.

I just pushed a fix, could you please try pulling the latest and let me know if it works any better?

KenwoodFox commented 9 months ago

Hey! That was so fast thank you! Let me pull that right now and I'll let you know how it goes!

I was going to ask if i should make a new branch but it looks like u went right to main haha

KenwoodFox commented 9 months ago

Hm.. different error?

2023-11-29 23:02:32 [9] [INFO] hyacinth.discord.discord_bot Adding commands to guild White Mountain Motorcycle Club
2023-11-29 23:02:57 [9] [DEBUG] hyacinth.util.scraping Scraping https://nh.craigslist.org/mcd/d/manchester-2019-indian-roadmaster/7692758934.html
2023-11-29 23:03:12 [9] [ERROR] hyacinth.monitor Error polling search SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
Traceback (most recent call last):
  File "/app/hyacinth/monitor.py", line 95, in __safe_poll_search
    listings = await search_spec.plugin.get_listings(search_spec.search_params, after_time)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/plugin.py", line 38, in get_listings
    return await get_listings(search_params, after_time, limit)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/client.py", line 32, in get_listings
    async for listing in search:
  File "/app/plugins/craigslist/client.py", line 60, in _search
    detail_content = await _get_detail_content(result_url)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/client.py", line 86, in _get_detail_content
    scrape_result = await scrape(url, selectors=["html"], waitUntil="networkidle0")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/util/scraping.py", line 31, in scrape
    r.raise_for_status()
  File "/home/joyvan/.cache/pypoetry/virtualenvs/hyacinth-9TtSrW0h-py3.11/lib/python3.11/site-packages/httpx/_models.py", line 758, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://browserless:3000/scrape?stealth&blockAds=true'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
2023-11-29 23:03:12 [9] [INFO] hyacinth.util.crash_report Saving error report to logs/poll_failure_2023-11-29T23:03:12.463654.txt
2023-11-29 23:03:12 [9] [DEBUG] hyacinth.monitor Found 0 since 2023-11-29 22:31:02.629522-05:00 for search_spec=SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')

Different is better though! Though im not sure what happened? issue with the web browser component?

stephanlensky commented 9 months ago

Hm, yes I guess something is going wrong with the browser component. Hard to say what since I didn't add logs πŸ˜„

I just added some in latest, could you try again? this time there should be a more useful error message in the logs.

KenwoodFox commented 9 months ago

Yes absolutly! Thank you so much~ Im ok with switching to another branch btw if you want to keep main tidy!

KenwoodFox commented 9 months ago

Weird..

2023-11-29 23:18:33 [9] [DEBUG] plugins.marketplace.client Loading marketplace search results
2023-11-29 23:18:33 [9] [DEBUG] hyacinth.util.scraping Loading page https://www.facebook.com/marketplace/103703779667744/motorcycles/?sortBy=creation_time_descend&exact=false
2023-11-29 23:18:39 [9] [DEBUG] plugins.marketplace.client Waiting for marketplace search results to render
2023-11-29 23:18:39 [9] [DEBUG] plugins.marketplace.client Marketplace search results rendered
2023-11-29 23:18:39 [9] [DEBUG] plugins.marketplace.client Getting search results page content
2023-11-29 23:18:43 [9] [DEBUG] hyacinth.util.scraping Scraping https://nh.craigslist.org/mcd/d/manchester-2019-indian-roadmaster/7692758934.html
2023-11-29 23:18:43 [9] [DEBUG] hyacinth.util.scraping Loading page https://www.facebook.com/marketplace/item/674776421428151/
2023-11-29 23:18:51 [9] [INFO] hyacinth.util.geo Loading geospatial datasets...
2023-11-29 23:18:58 [9] [INFO] hyacinth.util.geo Creating indexes...
2023-11-29 23:18:59 [9] [INFO] hyacinth.util.geo Done!
2023-11-29 23:19:00 [9] [DEBUG] plugins.marketplace.client Found listing 2012 Harley-Davidson FLHTCU Tri-Glide at 2023-11-29 18:59:20-05:00
2023-11-29 23:19:00 [9] [DEBUG] hyacinth.util.scraping Loading page https://www.facebook.com/marketplace/item/1099923694353815/
2023-11-29 23:19:00 [9] [DEBUG] hyacinth.monitor Found 0 since 2023-11-29 22:31:02.629522-05:00 for search_spec=SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error (Runtime.releaseObject): Cannot find context with specified id')>
pyppeteer.errors.NetworkError: Protocol error (Runtime.releaseObject): Cannot find context with specified id
2023-11-29 23:19:06 [9] [DEBUG] hyacinth.monitor Found 1 since 2023-11-29 22:18:25.869194+00:00 for search_spec=SearchSpec(id=3, plugin_path=plugins.marketplace.plugin:MarketplacePlugin, search_params=location='103703779667744' category='motorcycles')
2023-11-29 23:19:25 [9] [DEBUG] hyacinth.notifier Running notifier for new listings!
2023-11-29 23:19:25 [9] [DEBUG] hyacinth.notifier Most recent listing was found at 2023-11-29 23:18:25.895498-05:00
2023-11-29 23:19:25 [9] [DEBUG] hyacinth.notifier Updating last_notified times for 2 active searches
2023-11-29 23:19:25 [9] [DEBUG] hyacinth.notifier Found 1 to notify for across 2 active searches
Job "ChannelNotifier._notify_new_listings (trigger: interval[0:01:00], next run at: 2023-11-29 23:20:25 EST)" raised an exception
Traceback (most recent call last):
  File "/app/hyacinth/util/boolean_algebra.py", line 114, in tokenize
    tokens.append((TOKENS[tok.lower()], tok, position))
                   ~~~~~~^^^^^^^^^^^^^
KeyError: '"'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/joyvan/.cache/pypoetry/virtualenvs/hyacinth-9TtSrW0h-py3.11/lib/python3.11/site-packages/apscheduler/executors/base_py3.py", line 30, in run_coroutine_job
    retval = await job.func(*job.args, **job.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/notifier.py", line 229, in _notify_new_listings
    listings = list(filter(self.should_notify_listing, listings))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/notifier.py", line 170, in should_notify_listing
    return filters.test(listing, self.config.filters)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/filters.py", line 28, in test
    result = _apply_rule_expr(filter_.rule_expr, listing[filter_.field])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/filters.py", line 43, in _apply_rule_expr
    return _apply_string_rule_expr(expr, field)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/filters.py", line 69, in _apply_string_rule_expr
    expression = parse_string_rule_expr(expr)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/filters.py", line 77, in parse_string_rule_expr
    return parse_expression(expr)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/util/boolean_algebra.py", line 161, in parse_expression
    return algebra.parse(rule_str)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joyvan/.cache/pypoetry/virtualenvs/hyacinth-9TtSrW0h-py3.11/lib/python3.11/site-packages/boolean/boolean.py", line 200, in parse
    tokenized = self.tokenize(expr)
                ^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/util/boolean_algebra.py", line 119, in tokenize
    raise ParseError(
boolean.boolean.ParseError: Unknown token for token: """ at position: 4

I cant tell if this is one error or two, we got a future await error? but also a token error?

Do you have an instance running locally? Is this an isolated thing just to me or is this perhaps larger

stephanlensky commented 9 months ago

Im ok with switching to another branch btw if you want to keep main tidy!

Sure, if we need to add something that shouldn't go into main we can do that - for now all of these changes belong there though, so it's not an issue. I'm not bothering with branches and PRs for these types of changes since I'm currently the only contributor to this project.

I cant tell if this is one error or two, we got a future await error? but also a token error?

Do you have an instance running locally? Is this an isolated thing just to me or is this perhaps larger

That's definitely odd, the future error is also related to browserless, just like your other issue.

I don't have an instance running daily anymore, but I brought one back up to test and I am not able to replicate the errors.

Let's try to address these one at a time:

  1. For the browserless issues, it's possible that I have an older version of the image cached and a recent update (which you have) broke things. I'll try to make sure I am running the latest image and see if that lets me reproduce the issue.
  2. The token error is related to one of your filters not being parsed correctly. I saw in your above screenshot that you didn't have any filters - did you add one since then? Is the output of /show still the same as in your original screenshot (no filters)?
stephanlensky commented 9 months ago

Regarding 2. (token error), I found a bug where entering non-alphanumeric/non-boolean operator symbols (&, |, etc.) in a rule could cause the crash you saw.

I've pushed a fix to prevent these bad rules from being created in the future, but since it seems you already have one saved you will need to remove it manually with /filter delete.

Will look more into the browserless issue later in the week.

KenwoodFox commented 9 months ago
  1. The token error is related to one of your filters not being parsed correctly. I saw in your above screenshot that you didn't have any filters - did you add one since then? Is the output of /show still the same as in your original screenshot (no filters)?

Good catch! I do have filters now! Its very possible I messed one or more up. I'll delete them right now and pull your latest.

I've pushed a fix to prevent these bad rules from being created in the future...

Much appreciated :pray:

KenwoodFox commented 9 months ago

Heres what i see now after using your latest

2023-11-30 00:16:41 [9] [DEBUG] plugins.marketplace.client Loading marketplace search results
2023-11-30 00:16:41 [9] [DEBUG] hyacinth.util.scraping Loading page https://www.facebook.com/marketplace/103703779667744/motorcycles/?sortBy=creation_time_descend&exact=false
2023-11-30 00:16:47 [9] [DEBUG] plugins.marketplace.client Waiting for marketplace search results to render
2023-11-30 00:16:48 [9] [DEBUG] plugins.marketplace.client Marketplace search results rendered
2023-11-30 00:16:48 [9] [DEBUG] plugins.marketplace.client Getting search results page content
2023-11-30 00:16:52 [9] [DEBUG] hyacinth.util.scraping Scraping https://nh.craigslist.org/mcd/d/manchester-2019-indian-roadmaster/7692758934.html
2023-11-30 00:16:52 [9] [DEBUG] hyacinth.util.scraping Loading page https://www.facebook.com/marketplace/item/674776421428151/
2023-11-30 00:17:01 [9] [INFO] hyacinth.util.geo Loading geospatial datasets...
2023-11-30 00:17:06 [9] [INFO] hyacinth.util.geo Creating indexes...
2023-11-30 00:17:07 [9] [INFO] hyacinth.util.geo Done!
2023-11-30 00:17:07 [9] [ERROR] hyacinth.util.scraping Failed to scrape https://nh.craigslist.org/mcd/d/manchester-2019-indian-roadmaster/7692758934.html: Navigation timeout of 10000 ms exceeded
2023-11-30 00:17:07 [9] [ERROR] hyacinth.monitor Error polling search SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
Traceback (most recent call last):
  File "/app/hyacinth/monitor.py", line 95, in __safe_poll_search
    listings = await search_spec.plugin.get_listings(search_spec.search_params, after_time)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/plugin.py", line 38, in get_listings
    return await get_listings(search_params, after_time, limit)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/client.py", line 32, in get_listings
    async for listing in search:
  File "/app/plugins/craigslist/client.py", line 60, in _search
    detail_content = await _get_detail_content(result_url)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/plugins/craigslist/client.py", line 86, in _get_detail_content
    scrape_result = await scrape(url, selectors=["html"], waitUntil="networkidle2")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/hyacinth/util/scraping.py", line 32, in scrape
    r.raise_for_status()
  File "/home/joyvan/.cache/pypoetry/virtualenvs/hyacinth-9TtSrW0h-py3.11/lib/python3.11/site-packages/httpx/_models.py", line 758, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://browserless:3000/scrape?stealth&blockAds=true'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
2023-11-30 00:17:07 [9] [INFO] hyacinth.util.crash_report Saving error report to logs/poll_failure_2023-11-30T00:17:07.897728.txt
2023-11-30 00:17:07 [9] [DEBUG] hyacinth.monitor Found 0 since 2023-11-29 22:31:02.629522-05:00 for search_spec=SearchSpec(id=2, plugin_path=plugins.craigslist.plugin:CraigslistPlugin, search_params=site='nh' nearby_areas=None category='mca')
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error Runtime.releaseObject: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Runtime.releaseObject: Target closed.
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error (Target.sendMessageToTarget): No session with given id')>
pyppeteer.errors.NetworkError: Protocol error (Target.sendMessageToTarget): No session with given id
2023-11-30 00:17:07 [9] [DEBUG] hyacinth.monitor Found 0 since 2023-11-29 23:18:25.895498-05:00 for search_spec=SearchSpec(id=3, plugin_path=plugins.marketplace.plugin:MarketplacePlugin, search_params=location='103703779667744' category='motorcycles')
2023-11-30 00:17:33 [9] [DEBUG] hyacinth.notifier Running notifier for new listings!
2023-11-30 00:17:33 [9] [DEBUG] hyacinth.notifier Found 0 to notify for across 2 active searches

And removing all my filters.

KenwoodFox commented 9 months ago

Im glad you added that check btw, but now im confused about how to filter by price haha! image

The manual says that filter <2500 is valid? Not exactly related to this issue but, perhaps worth peeking at!

stephanlensky commented 9 months ago

πŸ‘‹ Hi again, yep apologies that was my bad on the filter rules, there was a bug with the validation. It should be fixed now.

Regarding the other issue we saw in your most recent logs, it looks like for some reason the search results were never rendering in the browser backend, leading to a timeout. I wasn't able to replicate this at all on my own machine, even after making sure I was running the latest browserless image.

I just switched over the Craigslist plugin to use the same puppeteer-based API as in the Marketplace plugin, which gave me more control over how the browser behaves. Feel free to try again on the latest version, if we're lucky that may resolve the issue.

If not: what OS/architecture are you running on? Is there anything special about your network configuration that could be causing this? Are you able to load the page that was timing out (https://nh.craigslist.org/mcd/d/manchester-2019-indian-roadmaster/7692758934.html) in a regular browser?

stephanlensky commented 9 months ago

FYI, I also just pushed a change to the env variables - they are now prefixed with HYACINTH_. You'll need to update your .env file accordingly before running the latest version.

KenwoodFox commented 9 months ago

Awesome! Thanks for being so quick going back and forth!

Im able to load the pages totally fine, I don't think i have any network issues.. everything seems ship shape, i can curl that web page too and at least get HTML instantly from the same host that's running the containers.

This might be anecdotal but, it seems to pick up the craigslist searches easily but.. i don't think I've actually seen it send a single marketplace listing yet? Even though im very sure that there has been activity in my area in the last week.. not sure how to actually quantify that though so,

stephanlensky commented 9 months ago

Weird. I'm sorry to say I don't have a lead on what's going on as I'm still not able to reproduce on my end.

If you'd like to keep trying to debug with me (no worries if not), let's try to test the page scraping more directly (without running the full bot).

I have some helper scripts prepared which can collect samples of pages from both Craigslist and Marketplace using the same logic that powers the bot. Maybe this will allow us to isolate the issue further.

To download page samples:

  1. Pull my latest changes and build the local development container
    git pull && docker-compose build devbox
  2. Run it to get a shell:

    docker-compose run --rm devbox
  3. Download Craigslist page samples:
    just get-craigslist-page-sample
  4. Download Marketplace page samples:
    just get-marketplace-page-sample

I'm curious whether these commands will succeed or not. You can check either by looking at the log outputs (you should see Successfully written ... or by running git status and seeing if the page samples have been modified.

BTW, unrelated, but if you changed the environment variables POSTGRES_USER and POSTGRES_PASSWORD to HYACINTH_POSTGRES_USER and HYACINTH_POSTGRES_PASSWORD as a result of my previous message, you'll want to change them back (other variables should keep the prefix though). I realized that prefixing these variables wouldn't work, since they are shared by the database container. Sorry for the confusion.

KenwoodFox commented 9 months ago

image

I dont recall setting a distance for the marketplace plugin so maybe its just too far away?

I feel like it totally should have hit this, https://www.facebook.com/marketplace/item/887044165950941/?mibextid=dXMIcH

KenwoodFox commented 9 months ago

Weird. I'm sorry to say I don't have a lead on what's going on as I'm still not able to reproduce on my end.

If you'd like to keep trying to debug with me (no worries if not), let's try to test the page scraping more directly (without running the full bot).

I have some helper scripts prepared which can collect samples of pages from both Craigslist and Marketplace using the same logic that powers the bot. Maybe this will allow us to isolate the issue further.

To download page samples:

1. Pull my latest changes and build the local development container
   ```
   git pull && docker-compose build devbox
   ```

2. Run it to get a shell:
   ```
   docker-compose run --rm devbox
   ```

3. Download Craigslist page samples:
   ```
   just get-craigslist-page-sample
   ```

4. Download Marketplace page samples:
   ```
   just get-marketplace-page-sample
   ```

I'm curious whether these commands will succeed or not. You can check either by looking at the log outputs (you should see Successfully written ... or by running git status and seeing if the page samples have been modified.

BTW, unrelated, but if you changed the environment variables POSTGRES_USER and POSTGRES_PASSWORD to HYACINTH_POSTGRES_USER and HYACINTH_POSTGRES_PASSWORD as a result of my previous message, you'll want to change them back (other variables should keep the prefix though). I realized that prefixing these variables wouldn't work, since they are shared by the database container. Sorry for the confusion.

Thanks for being so helpful! I'll give these a go.

and yeah! I actually saw that bug in the example.env but, just included HYACINTH_POSTGRES_PASSWORD and POSTGRES_PASSWORD both cuz it gave me a warning, figured having both wouldn't hurt. it was the same value either way

stephanlensky commented 9 months ago

I dont recall setting a distance for the marketplace plugin so maybe its just too far away?

I feel like it totally should have hit this, https://www.facebook.com/marketplace/item/887044165950941/?mibextid=dXMIcH

Yes, it probably should have. Unless you set a filter on distance, the Marketplace plugin should return all listings posted in the provided location.

You can also check if any listings have been successfully scraped by checking the database directly. Load up pgweb with docker-compose up -d pgweb, then go to http://localhost:8081/. You should see some listings in the listings table - if not, that means Hyacinth has not successfully scraped anything.

KenwoodFox commented 9 months ago

I dont recall setting a distance for the marketplace plugin so maybe its just too far away? I feel like it totally should have hit this, facebook.com/marketplace/item/887044165950941/?mibextid=dXMIcH

Yes, it probably should have. Unless you set a filter on distance, the Marketplace plugin should return all listings posted in the provided location.

You can also check if any listings have been successfully scraped by checking the database directly. Load up pgweb with docker-compose up -d pgweb, then go to http://localhost:8081. You should see some listings in the listings table - if not, that means Hyacinth has not successfully scraped anything.

Awesome! Figuring out where the debug tools are makes it a lot easier to see whats going on.

image

It looks like its only craigslist listing in here, even though i can see its looking at facebook pages and scraping them in the logs.

image

On the bright side, the bot has been performing pretty well in our discord server and we're mostly happy :3

stephanlensky commented 9 months ago

Okay, I'm glad to hear it is working for Craigslist at least. Debug tools definitely do make everything easier!

I'm going to go ahead and close this issue, but feel free to open another in the future if you'd like to try and get Marketplace working.

If you do, please try running just get-marketplace-page-sample in the development container beforehand (as described above) and include the output, as well as complete logs from Hyacinth, in the issue.

Thank you πŸ™‚

KenwoodFox commented 9 months ago

Okay, I'm glad to hear it is working for Craigslist at least. Debug tools definitely do make everything easier!

I'm going to go ahead and close this issue, but feel free to open another in the future if you'd like to try and get Marketplace working.

If you do, please try running just get-marketplace-page-sample in the development container beforehand (as described above) and include the output, as well as complete logs from Hyacinth, in the issue.

Thank you πŸ™‚

Perfect! Will do! This was super nice i appreciate it a lot!