issues
search
jamesturk
/
spatula
A modern Python library for writing maintainable web scrapers.
https://jamesturk.github.io/spatula/
MIT License
244
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
lxml is limited to 4.X
#39
showerst
closed
3 months ago
1
WIP: add some test_utils to experiment with
#38
jamesturk
opened
1 year ago
0
Improve (& document) scraper testing workflow
#37
jamesturk
opened
1 year ago
2
Bug Report: PDF scraping
#36
jamesturk
opened
1 year ago
0
Retries - Bad Pattern
#35
styledev
closed
2 years ago
1
Recursive dependency fetching
#34
chriszs
closed
2 years ago
1
add configurable default settings
#33
jamesturk
opened
2 years ago
0
should retry
#32
jamesturk
closed
2 years ago
0
should_retry
#31
jamesturk
closed
2 years ago
1
How to handle optional element Error
#29
ghbook
closed
2 years ago
3
allow test to traverse subpages, #25
#28
jamesturk
closed
3 years ago
0
output formats
#27
jamesturk
closed
3 years ago
0
added verify argument to URL()
#26
morden35
closed
3 years ago
0
improved spatula test recommendations
#25
jamesturk
closed
3 years ago
0
built-in support/examples for scraping ASP.net pages
#24
jamesturk
opened
3 years ago
1
Specify unique id
#23
magick93
opened
3 years ago
3
docs links broken in README.md
#21
lo5an
closed
3 years ago
1
multi-first-page scrapes
#20
jamesturk
opened
3 years ago
1
decide how input values work on first page
#19
jamesturk
opened
3 years ago
0
add configurable output options
#18
jamesturk
closed
3 years ago
0
unset name change
#17
jamesturk
closed
3 years ago
0
add caching options
#16
jamesturk
closed
3 years ago
0
yoyodyne
#15
jamesturk
closed
3 years ago
0
scouts
#14
jamesturk
closed
3 years ago
0
page responses
#13
jamesturk
closed
3 years ago
0
branching/differentiation among scrape path
#12
jamesturk
closed
3 years ago
0
pagination & related improvements for bill scraping
#11
jamesturk
closed
3 years ago
0
document Page.dependencies and related stuff
#10
jamesturk
closed
3 years ago
0
improve testing
#9
jamesturk
closed
3 years ago
1
more documentation
#8
jamesturk
opened
3 years ago
4
plan for using spatula as a library only
#7
jamesturk
closed
3 years ago
0
finish writing scrape CLI
#6
jamesturk
closed
3 years ago
0
finish writing tutorial
#5
jamesturk
closed
3 years ago
0
add scrapeshell options to spatula shell
#4
jamesturk
closed
3 years ago
0
add logging interface
#3
jamesturk
closed
3 years ago
0
Add (or document) logging
#2
mileswwatkins
closed
6 years ago
1
Add CSV "Page" type
#1
divergentdave
closed
7 years ago
3