laughingclouds / Scrapia-World

A web scraper for scraping wuxiaworld. Written in python, using selenium and python cmd for an interactive shell experience with a command line utility to work with text along with a database to store information.
MIT License
2 stars 1 forks source link

JSON files are not updating #11

Closed laughingclouds closed 2 years ago

laughingclouds commented 2 years ago

There must be something wrong with the piece of code where the dictionaries are being updated. My bet is on the popFirstElementUpdateOtherDict function

laughingclouds commented 2 years ago

Well...here's the thing.

The updated json files would sometimes have small discrepancies.

If Chapter 286 was scraped, then the json would have changes till chapter 283 only (chapters 284-286 would be considered "un-read").

Initially it wasn't that huge, but this time the difference was 3 chapters (yes, the previous example). That's like wasting 300 seconds (sleep time after each successful scrape is 100).

This is of importance only when either the some error occurs (which is maybe less likely) or when the code is interrupted (key-board interrupt).

I guess we can add a function for the user to invoke, before they quit the program.

laughingclouds commented 2 years ago

I was an idiot. It should be solved now. I'll test it later. refer to af60d0f.

laughingclouds commented 2 years ago

Update: The problem still persisted after that.

Later on I realized that the code exits in more that one way. One of those ways is a simple exit command in the interactive shell.

I made it save the json files if they're not empty. refer to db12b06b1edbd40fd8284b9a5bb88e01c95ed4cd

The lazy is taking over. I'm closing this. I hope the problem doesn't pop up again.