Open thiswillbeyourgithub opened 1 year ago
Actually, I started this project almost 9 years ago - late 2014 (see my first commit), when there are only few open-sourced extractors, and they didn't perform well at that time.
One reason to write it from scratch is flexibility and customizability - I can tune the parameters so that it suits better for HN posts. One case is the HN comments page, it appears frequently on front-page but most extractors do not get the right content.
I'll try some of the modern ones later, thanks.
Interesting thanks.
Hi,
I read this page from your doc the other day and was wondering.
Why not just article extractors made in the passed? There is even a github tag for some of them there
Just wondering, hope you don't mind