issues
search
scrapy
/
scrapely
A pure-python HTML screen-scraping library
1.86k
stars
315
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
formatting cleanups
#72
kmike
closed
9 years ago
1
README fixes
#71
kmike
closed
9 years ago
0
Drop Python 2.6 support
#70
kmike
closed
8 years ago
1
remove most Scrapy mentions from the README
#69
kmike
closed
9 years ago
2
Obtaining sectioned article text
#68
Shadonar
opened
9 years ago
1
Add python 3 support
#67
ruairif
closed
9 years ago
7
Does the order of annotations matter - Weird output
#66
dav009
opened
9 years ago
0
Mdr extractor
#65
tpeng
opened
9 years ago
0
allow use multiple extractors in TemplatePageExtractor
#64
tpeng
closed
9 years ago
0
PEP8 compliance
#63
cyberplant
opened
9 years ago
0
Fixes #61
#62
cyberplant
closed
9 years ago
0
Random failing doctests
#61
dangra
closed
9 years ago
0
move htmlpage to Page
#60
tpeng
closed
9 years ago
2
fix trailing spaces
#59
tpeng
closed
9 years ago
2
Mining listing data
#58
tpeng
closed
9 years ago
2
Is this still an active project?
#57
chachra
closed
9 years ago
0
Some usability improvements for the cmdline tool
#56
eliasdorneles
closed
9 years ago
3
Html page containing more than one single entity. How to annotate?
#55
bitliner
opened
10 years ago
0
What you mean with "The training implementation is currently very simple and is only provided for references purposes, to make it easier to test Scrapely and play with it. "
#54
bitliner
opened
10 years ago
0
Can I train the scraper on multiple pages so given a certain page it chooses automatically the template?
#53
bitliner
opened
10 years ago
1
benchmarks?
#52
mezuqu
closed
8 years ago
1
Python 3 support
#51
mattdbr
closed
9 years ago
1
allow to copy/deepcopy HtmlPageParsedRegion
#50
kalessin
closed
10 years ago
2
ZeroDivisionError when training with zero-length data
#49
haywhisksoftware
opened
10 years ago
4
allow use html tag attributes in similar_region
#48
tpeng
closed
9 years ago
1
allow pass different subsequence method in similar_region
#47
tpeng
closed
10 years ago
0
[MRG] scrapely.tool: add support for non-ascii <text> and <data> arguments
#46
kmike
closed
10 years ago
0
support CJK string annotation; print readably CJK string in scrapely.tool's output
#45
xyb
opened
10 years ago
6
Update README.rst
#44
decause
closed
10 years ago
0
Provide method for parsing HTML that has already been downloaded by external libraries.
#43
louist87
closed
10 years ago
1
Simplifying the test_extraction code and a few clean ups
#42
AlexRiina
closed
10 years ago
1
fixed a link to the ubuntu packages in the README
#41
vad
closed
11 years ago
0
tool.parse_criteria normalizes whitespace
#40
dpnova
closed
11 years ago
1
How to use use html data instead of direct URLs
#39
mejo
closed
10 years ago
3
README Usage (command line tool) correction
#38
smartexpert
closed
11 years ago
0
correctly handle tag name replacement when replaced tags are not closed.
#37
kalessin
closed
11 years ago
0
refactor text extractor and ignore xml declarations
#36
kalessin
closed
11 years ago
0
possible to pass scrapy response object to scrapely?
#35
ghost
closed
11 years ago
1
l
#34
ghost
closed
10 years ago
9
don't force to raise exception when an ignored region is not inside the
#33
kalessin
closed
11 years ago
0
allow to disable application of extra required attributes for testing purposes
#32
kalessin
closed
11 years ago
0
Fix extraction when two immediate consecutive annotations shares a token
#31
kalessin
closed
11 years ago
0
correctly extract regions that follows more than one consecutive misses
#30
kalessin
closed
11 years ago
0
Attribute with name "content"
#29
kalessin
closed
11 years ago
0
problem with bad encoding and BOM?
#28
jperelli
opened
11 years ago
2
Improve similarity algorithm to make usage of the extracted data per region
#27
kalessin
closed
11 years ago
0
avoid exception when instantiating htmlregion if parent htmlpage has empty body. Added test.
#26
kalessin
closed
11 years ago
0
safehtml: enclose table elements between table/tbody when they are not present, added tests (#24)
#25
kalessin
closed
11 years ago
2
safehtml should ensure tabular content safety
#24
omab
closed
6 years ago
3
Correct example at README.rst
#23
lmorillas
closed
12 years ago
0
Previous
Next