vezaynk / Sitemap-Generator-Crawler

PHP script to recursively crawl websites and generate a sitemap. Zero dependencies.
https://www.bbss.dev
MIT License
243 stars 93 forks source link

Switch from arrays to hashtables #68

Closed vezaynk closed 6 years ago

vezaynk commented 6 years ago

The current implementation uses arrays and iterates them to search for values. With sites such as codinghorror, it starts to become slow because of its O(n) complexity. Meanwhile, hashtables offer the same exact functionality but with O(1) complexity.

Before:

▶ time php sitemap.php site=https://blog.codinghorror.com
 [+] Sitemap has been generated in 746.65 secondsand saved to sitemap.xml
 [+] Scanned a total of 3137 pages and indexed 1704 pages.
 [+] Operation Completed
php sitemap.php site=https://blog.codinghorror.com  6.04s user 1.42s system 0% cpu 12:26.71 total

After:

▶ time php sitemap.php site=https://blog.codinghorror.com
 [+] Sitemap has been generatedin 570.19 secondsand saved to sitemap.xml
 [+] Scanned a total of 3137 pages and indexed 1704 pages.
 [+] Operation Completed
php sitemap.php site=https://blog.codinghorror.com  4.67s user 1.45s system 1% cpu 9:30.23 tota

Performance difference on server is negligible.

26

vezaynk commented 6 years ago

Pushed in 4b0a38f75eae88bf03cccf98b34e643c8403ea7b