hernamesbarbara / table2csv

Extract data from an HTML table and store results to a csv file.
38 stars 10 forks source link

html5lib #3

Open jackpolymath opened 8 years ago

jackpolymath commented 8 years ago

Thanks for this code. Had a problem with "html.parser" on the results of the site http://www.freepatentsonline.com/search.html. When you run a search on that site, it returns a table of results. The html.parser seems to break the results table (--nth=2). On my own machine, I changed the get_soup function to use the "html5lib" parser and your code worked correctly. I'll leave it you to change your own github code. Maybe include a second parameter (i.e. --parser=html5lib) or import html5lib in tf1.py (BeautifulSoup seems to use html5lib if available, otherwise html.parser as a default). Thanks again.