grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python
Apache License 2.0
3.98k stars 787 forks source link

li tags in html not extracted #263

Open sparvind2000 opened 7 years ago

sparvind2000 commented 7 years ago

Please check the following site http://www.hiewatch.com/news/trump-transition-team-hears-interoperability-pitch I don't get the 4 points listed in the body of text

nhat2008 commented 6 years ago

cause python-goose is using tons of hardcode value in class and function, you can take a deep look to those functions and consider change some values in them. example in cleaners.py

barrust commented 6 years ago

This was looked into and resolved in the python3 port of the library (also maintained): goose3

Full Disclosure: I help maintain goose3