thequbit / BarkingOwl

scalable web scraper framework for finding documents on websites.
GNU General Public License v3.0
19 stars 7 forks source link

Follow <embed> tags as <a> tags #27

Closed thequbit closed 9 years ago

thequbit commented 10 years ago

Noticed that it may be useful to follow tags the way tags are followed. For tags we follow the href attribute, with tags we follow the src attribute.

This should be included as a possible boolean value passed into the barking owl scraper when initialized.

thequbit commented 9 years ago

added support for:

tag_types = [
            ('a','href'),
            ('img','src'),
            ('link','href'),
            ('object','data'),
            ('source','src'),
            ('script','src'),
            ('embed','src'),
            ('iframe','src'),
        ]

Added in 0.5.2