Open EMegamanu opened 10 years ago
There was a massive change in the HTML markup on schema.org.
http://schema.org/docs/schema_org_rdfa.html is the canonical schema used to generate all the type and property pages on schema.org, maybe you could scrape that one instead if it works for your use case (either scraping the HTML or parsing the RDFa into RDF and generating CSV from there).
It seems working... but the generated files contain only the labels line.
Did I miss something ?
Scraping Schema.org classes and properties into csv files does not work at this time.
I got the following stacktrace : $> python scrape_csv.py classes.csv properties.csv Traceback (most recent call last): File "scrape_csv.py", line 12, in
types = schema_scraper.get_all_types()
File "/Users/emmanuel/Downloads/schema-org-rdf-master/scrapers/schema_scraper.py", line 20, in get_all_types
types[id] = get_type_details(base_url + id)
File "/Users/emmanuel/Downloads/schema-org-rdf-master/scrapers/schema_scraper.py", line 49, in get_type_details
id = ancestor_links[-1].text_content()
IndexError: list index out of range