issues
search
thequbit
/
BarkingOwl
scalable web scraper framework for finding documents on websites.
GNU General Public License v3.0
19
stars
7
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
styling ,spelling correction
#50
prashantkumarved
opened
4 years ago
0
Improve Update README.md
#49
sandhyayadav0711
opened
4 years ago
0
Bump requests from 2.5.1 to 2.20.0
#48
dependabot[bot]
opened
4 years ago
0
Convert to Python3
#47
ralic
opened
7 years ago
0
Handle spaces in URL names
#46
thequbit
opened
9 years ago
0
Handle relative links better
#45
thequbit
closed
9 years ago
1
Add a Gitter chat badge to README.md
#44
gitter-badger
closed
9 years ago
0
Validate url_data within dispatcher
#43
thequbit
opened
9 years ago
0
Add abiliy to run barkingowl_scraper.py and barkingowl_dispatcher.py as daemons
#42
thequbit
closed
9 years ago
1
Include what the error was with the url within the bad_urls list
#41
thequbit
opened
9 years ago
0
If file type comes back with null, try again with bigger header size
#40
thequbit
closed
9 years ago
1
Provide more feedback for invalid url_data.
#39
ralphbean
closed
9 years ago
1
Fix SyntaxError.
#38
ralphbean
closed
9 years ago
0
setup.py: backingowl => barkingowl
#37
msabramo
closed
9 years ago
1
scraper appears to loose connection randomly
#36
thequbit
closed
9 years ago
2
Ad exception cause for "mailto:" URL
#35
thequbit
closed
10 years ago
1
Broadcast to check if UUID is already in use
#34
thequbit
closed
9 years ago
1
Allow for time masking
#33
thequbit
opened
10 years ago
2
Allow for configurable scraper sleep time
#32
thequbit
closed
9 years ago
1
Allow for setting custom UserAgent field.
#31
thequbit
closed
9 years ago
1
Include title of page document was found on
#30
thequbit
closed
9 years ago
1
Busy flag inconsistently being set back to false on completion
#29
thequbit
closed
9 years ago
1
Allow for 'not' operation for document types.
#28
thequbit
opened
10 years ago
0
Follow <embed> tags as <a> tags
#27
thequbit
closed
9 years ago
1
The document converter, nor the document processor are apart of the message bus.
#26
thequbit
closed
9 years ago
1
Scrapers do not pull down next URL correctly.
#25
thequbit
closed
9 years ago
1
Added Dispatcher and Scraper packages
#24
citruspi
closed
10 years ago
0
Packaging via setup.py
#23
citruspi
closed
10 years ago
0
Removed tools (except for GlobalShutdown)
#22
citruspi
closed
10 years ago
0
Tools cleanup
#21
citruspi
closed
10 years ago
0
Changed the import method
#20
citruspi
closed
10 years ago
0
Removed controller from BarkingOwl
#19
citruspi
closed
10 years ago
0
Allow for wildcard in magic match
#18
thequbit
closed
9 years ago
2
Add logging to scrapers
#17
thequbit
opened
10 years ago
0
Sanity check all inputs on URLs within controller web site
#16
thequbit
closed
10 years ago
1
Sanity check all inputs on URLs within controller Flask ap.
#15
thequbit
closed
10 years ago
1
Add support for 'Root URL' for each 'Target URL'
#14
thequbit
closed
9 years ago
2
Review scraper and dispatcher exit
#13
thequbit
closed
9 years ago
1
Add ability to delete URLs from dispatcher database
#12
thequbit
closed
10 years ago
1
Add enabled/disable feature to URLs in dispatcher database
#11
thequbit
closed
10 years ago
1
Complete flask-based web control app
#10
thequbit
closed
10 years ago
1
Create a scraper control tool
#9
thequbit
closed
10 years ago
1
Error check against a bad date format in CreationDate
#8
thequbit
closed
10 years ago
2
Dispatcher is throwing out data blindly ... might want to rethink that
#7
thequbit
closed
10 years ago
2
Dispatch work via 0mq rather than threading
#6
thequbit
closed
11 years ago
1
actually look at robots.txt ...
#5
thequbit
opened
11 years ago
0
doctext encoding issue
#4
thequbit
closed
10 years ago
2
Add hacking instructions
#3
ralphbean
opened
11 years ago
4
Phrase detect
#2
thequbit
closed
10 years ago
1
add end time to scraps table`
#1
thequbit
closed
10 years ago
1
Next