philippta / flyscrape

Flyscrape is a command-line web scraping tool designed for those without advanced programming skills.
https://flyscrape.com
Mozilla Public License 2.0
1.02k stars 29 forks source link

cache: file, CPU and Memory usage #66

Open dynabler opened 4 months ago

dynabler commented 4 months ago

CPU

Steps taken:

  1. Run Flyscrape with cache: file option
  2. Re-run Flyscrape with cache: file option from the previous successful scraping
  3. Abort (power failure, system failure)
  4. Now the cache isn't closed properly (amazon.cache, amazon.cache-shm, amazon.cache-wal)
  5. Re-run Flyscrape with unclosed cache file
  6. CPU peaks at 100% per core. htop -d 0 reveals a constant 395% CPU usage (meaning 95-100% per core) and 1.23 GB memory usage. Memory ever exceeds 1.5 GB usage.

Worth a look for better CPU management in case of open cache file?

A Few Moments Later: I don't think the CPU stuff above is a problem. I will do a test again and update the info. The memory info below is correct, so that's worth a look.

Memory

When Flyscrape is re-running with a cache: file from a previous successful scraping session with a closed cache file, the CPU is 100% for 1 core, and others are switching. Memory usage never exceeds 1.5 GB. Tested with cache files from less than 1 GB to 10 GB.

Upside: Flyscrape can run on older hardware where memory is limited or less memory required to buy. Downside: newer hardware resource remains unused.