hanover-computing / canonicize-url

Get a stable, canonical version of any URL, with DNS and HTTPS checks, redirects, tracker stripping, and canonical link extraction!
GNU Lesser General Public License v3.0
12 stars 0 forks source link

Consider switching over the Regex calculations (or any other CPU intensive tasks) to worker threads #5

Open JaneJeon opened 2 years ago

JaneJeon commented 2 years ago

https://nodejs.org/api/worker_threads.html

But first, we need to bench & profile our existing code to make sure we're not prematurely optimizing, and to figure out what other parts block the loop with intensive computation.

For now, the two obvious culprits look to be the numerous regex matches on strip-trackers, and the parsing & matching of HTML in cheerio in canonicization. Both are currently heavily cached.