philippta / flyscrape

Flyscrape is a command-line web scraping tool designed for those without advanced programming skills.
https://flyscrape.com
Mozilla Public License 2.0
1.01k stars 28 forks source link

Flyscrape not following any links #70

Open u9g opened 1 month ago

u9g commented 1 month ago

Hi, first of all thanks for making this, it's a super intuitive project. However, it seems I have hit a road block when I want to follow links.

This is my code now: https://gist.github.com/u9g/c8fc564784a7c727650fce28c384186a

It seems like when I run it, only the line is hit for the first webpage where that first selector hits, and that no further webpages are hit. That selector is recognized though as the scraper can print out the URL I want it to follow, so I put that same selector into the follow option. However, it seems like the scraper fails to follow that link. Even if I change that complex selector to just a[href] like the default, flyscrape still choses not to follow any links.

Thanks for this tool, would love a solution to this problem!

philippta commented 1 month ago

I wish I could tell you more, but I've run your script unmodified and for me it apparently just works fine.

$ flyscrape version
flyscrape v0.8.1 darwin/arm64

$ flyscrape run pokebattler.com.js
[
  {
    "url": "https://www.pokebattler.com/raids",
    "data": {
      "nextPage": "https://www.pokebattler.com/raids/RAYQUAZA_MEGA"
    },
    "timestamp": "2024-06-21T11:22:49.600093+02:00"
  }2024/06/21 11:22:51 not on first page
2024/06/21 11:22:53 not on first page
2024/06/21 11:22:56 not on first page
2024/06/21 11:22:58 not on first page
2024/06/21 11:23:01 not on first page
2024/06/21 11:23:03 not on first page
^C

Here a few questions to get a better idea: