Open olivierobert opened 3 years ago
It has a reason. For example, If you type something like jsdfg0isdgijseg09sdfgoidsfg9
you will receive no results from Google. And the HTML attribute #result-stats
won't exist. Results won't exist and the result page won't be parseable. Try it in your browser with this word for a better understanding of what I'm saying.
What I was basically trying to do was supporting that edge case.
But I think I could have commented that part of the code (leaving a note for better understanding).
As it is a private function, I won't document it with a @doc
attribute.
One of the things I like from Elixir is its simplicity and cleaning, so I share the idea that code itself should be the best documentation. If code is speaking for itself then it's a good sign. If you need to comment it, it's because maybe it's too complex. But there are cases when putting a comment is relevant for the readers of that code.
Well understood on the case the implementation tries to catch. This case can indeed happen. From my perspective, beyond the need to document this case or not, what is more surprising is why this attached to total_results
as it could also lead to the same case for total_links
.
Maybe something like the following would be more explicit:
with true <- valid_search_result_page(document),
total_results <- parse_total_results(document),
total_links <- parse_links(document) do
#...
else
false -> #...
end
The check is more explicit and there is room for more error handling for each parsing error ðŸ’
Oh, yes, you are right!! I didn't see it that way. I think yours is a better approach because even when there's no results you can still have links. For example:
So, the number of results would be zero but the total links won't.
When I was doing the project I never thought on that. Now I see it. Thank you
The below code seems to imply that if no data is found for
total_results
, no scraping info is returned:https://github.com/hudsonbay/google_scraper_live_view/blob/22ee211c8df4d802fe0879824e005b05ebb4aa95/lib/google_scraper.ex?_pjax=%23js-repo-pjax-container%3Afirst-of-type%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%3Afirst-of-type%2C%20%5Bdata-pjax-container%5D%3Afirst-of-type#L66-L79
Why make the control flow this way? Would not it be better to return scraping information as long as the Google search result page is parseable?