postmodern spidr issues

postmodern / spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

MIT License

798 stars 109 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add `# frozen_string_literal: true` comments to all files

#89 postmodern closed 6 months ago
0
Switch to using `require_relative` to improve load-times

#88 postmodern closed 6 months ago
0
Add Ruby 3.2 to the CI matrix. Update checkout action version.

#87 petergoldstein closed 1 year ago
0
Switch to `Addressable::URI` for URI parsing

#86 postmodern opened 1 year ago
0
fix: use the correct status code for timedout

#85 davidsauntson closed 2 years ago
1
How to control the depth of crawling?

#82 masterbo98 closed 2 years ago
1
Support passing a URI as a proxy setting

#81 postmodern closed 2 years ago
0
Write specs for Agent.domain

#80 postmodern closed 2 years ago
1
Add spec for `Spidr::Agent.host`

#79 postmodern closed 2 years ago
0
Add spec for `Spidr::Agent.site`

#78 postmodern closed 2 years ago
0
Add spec for `Spidr::Agent.start_at`

#77 postmodern closed 2 years ago
0
Switch to using async-http

#76 postmodern opened 2 years ago
1
Switch to Ruby 2.x keyword arguments

#75 postmodern closed 2 years ago
0
Figure out why specs are failing only on JRuby?

#74 postmodern closed 3 years ago
1
fixed typo in proxy_spec

#73 andydna closed 2 years ago
0
Add Logging

#72 postmodern opened 4 years ago
0
Thank you

#71 thegreyfellow closed 6 months ago
1
Doc: Syntax highlight code blocks

#70 vfonic closed 5 years ago
4
Sitemap XML support

#69 buren opened 5 years ago
2
Simple Command Line Interface (CLI)

#68 buren opened 6 years ago
3
Add support for img/@src

#67 lasssim closed 6 years ago
1
path conflicts with opaque (URI::InvalidURIError)

#66 mustiikhalil closed 6 years ago
3
Check for opaque part of URI before attempting to set the path

#65 kyaroch closed 6 years ago
1
`ignore_links` not working.

#64 vwochnik closed 6 years ago
4
Lambda based delay

#63 dcadenas closed 3 years ago
1
Add low-level HTTP request methods

#62 postmodern opened 7 years ago
0
Add ignore_paths and ignore_paths_like

#61 postmodern opened 7 years ago
0
unable to ignore links

#60 vanegomez closed 7 years ago
4
Limit crawl to links matching pattern

#59 bricemaurin closed 7 years ago
3
Respect base tags

#58 ericmason opened 7 years ago
0
Page#to_absolut raises URI::InvalidURIError: path conflicts with opaque

#57 buren closed 6 years ago
7
Following redirects

#56 ZackMattor opened 7 years ago
4
Remove unused variable from example code

#55 tricknotes closed 7 years ago
0
Use Travis' new container-based infrastructure

#54 tricknotes closed 5 years ago
0
Session handling

#53 heavysixer closed 7 years ago
1
Fix warning instance variable @robots not initialized

#52 spk closed 7 years ago
2
Fix shadowing outer local variable - key

#51 spk closed 7 years ago
0
Remove assigned but unused variable host and port

#50 spk closed 7 years ago
0
Skip processing of pages

#49 darkcode85 closed 7 years ago
1
Hack solution until https://github.com/bblimke/webmock/issues/642 is resolved

#48 JoshCheek closed 7 years ago
1
Add default_headers option

#47 maccman closed 8 years ago
1
Crawling a specific page

#46 justaj closed 8 years ago
2
/../foo expands to just "foo"

#45 postmodern closed 8 years ago
0
Use webmock and to_rack in specs

#44 postmodern closed 2 years ago
1
Is there a way to set Accept-Encoding headers?

#43 robfuller closed 2 years ago
9
How can I 'ignore everything except' a set of links

#42 DHarls17 closed 8 years ago
4
Is it possible to display only part of a spidered URL?

#41 DHarls17 closed 8 years ago
3
Anyway to limit the total number of pages crawled or shutdown the crawler after some criteria?

#40 samur-vonq closed 8 years ago
2
Adds optionable support for obeying robots.txt

#39 buren closed 8 years ago
4
how to login via submit a form

#38 loyalpartner closed 8 years ago
1