Open Alalalalaki opened 3 years ago
I had the same problem. It is strange since I saw in youtube running similar code but with expected results but, it is not my experience. to help I copy, so you can reproduce the problem (these are cells from jupyter nb). I also print the results of BeautifulSoup
from requests_html import HTMLSession, HTML
doc = '<div class="class1">text1</div><div class="class2">text2</div><div class="class3">text3</div><div class="class4">text4</div>'
`html = HTML(html=doc)
for cl in ['class1', 'class2', 'class3', 'class4']:
print(html.find('div.' + cl, first=True).html)
print(html.find('div.' + cl, first=True).text)
print('-' * 100)
I recently do a "conda update --all" and then find that the HTML parsing of requests-html begins to work abnormally. In particular, the objection gotten from html.find() still contains all content of the html, e.g. if a = html.find("something", first=True), then a.text still shows all text of the page.
I then create a clean environment with only requests-html and it works well. So I guess the cause might be some recent updated version of some other package in my main environment has conflict with HTML parsing in requests-html. But I have no idea how this would happen and what would be the potential problematic package.
Any suggestion will be appreciated.