vfedotovs / sslv_web_scraper

ss.lv web scraping app helps automate information scraping and filtering from classifieds and emails results and stores scraped data in database
GNU General Public License v3.0
5 stars 3 forks source link

BUG(WS): web_scraper scrapes only first page and is now aware of other 2 pages #278

Closed vfedotovs closed 2 months ago

vfedotovs commented 2 months ago

Web Scraper version checks

Reproducible Example

Affected version 1.5.3

Issue Description

This page actually has main page and 2 supbages page2.html and page3.html https://www.ss.lv/lv/real-estate/flats/ogre-and-reg/ogre/sell/

Current behavior it scrapes only main page

Expected Behavior

Correct behavior it should identify number of subpages available and scrape all subpages

First main page kas 30 URLs with ads page2.html has 30 ads page3.htlp has 4 more

From ws log 2024-07-12 -LA (listed ads are only 30 ) and should be 64 2024-07-11 : TSA [A]: 30 LA TBL [B]: 30 AinB [C]: 27 KLAT A notin B [D]: 3 NewAds, B notin A [E]: 3 RM from LAT LA TBL rows: 30 RA TBL rows: 1544 2024-07-12 : TSA [A]: 30 LA TBL [B]: 30 AinB [C]: 28 KLAT A notin B [D]: 2 NewAds, B notin A [E]: 2 RM from LAT LA TBL rows: 30 RA TBL rows: 1546

vfedotovs commented 2 months ago

Resolved in 6b42ace