Closed olivierobert closed 3 years ago
Yeah, why not? In fact, crawly
uses httpoison
as the HTTP client under the hood. So the layer we're using for calling HTTPoison functions is crawly
. But that has a potential problem, you don't have control over HTTPoison responses because crawly
is doing everything for you.
For example, you cannot pattern match an error response from HTTPoison like {:error, :nxdomain}
because crawly
is assuming that you have internet all the time or the domain to crawl will always be up, for example.
So, as the case was not pattern matched inside the crawly
implementation you will receive an exception, instead of an error message. I think it will be fixed at some point or maybe I should try to collaborate and open a PR on that. The crawly
community and its creator are very friendly.
We agree that crawly
is is not a bad decision since it could be useful if you want to support more crawling functionalities.
crawly
is a great library but I agree with you that HTTPoison is a good option for this use case and parse the body with floki
.
My question is, do you want me to open a PR with a different implementation (using httpoison
, for example)?
If I were working on a real project I'd definitely do it
Thank you for detailing that httpoison
is used the hood by crawly
. So in effect, this dependency has a larger implementation surface than httpoison
i.e. it can do more things.
From my perspective, however, this extra power is not used at the moment. So in line with picking the right tool for the job, I find that crawly
is like a hammer while a screwdriver is needed (sorry for the average metaphor 😅 ). I guess I also come from the Elixir mindset of limiting dependencies as much as possible. Were it be in a Ruby/Rails environment, it might not cause such a fuss.
There is no need to make a fix for this change.
like a hammer while a screwdriver is needed (sorry for the average metaphor 😅 )
😅
GoogleScraper
makes use of the packagecrawly
.Upon checking the documentation for the package, it seems to be a great solution to deeply crawl an entire website. However, the current usage seems limited to make a request to a single page. The current implementation does not make use
Crawly.Spider
for instance 🤔Why not use other more popular HTTP libraries such as httpoison or tesla?