issues
search
indrajithi
/
tiny-web-crawler
A simple and easy to use web crawler for Python
MIT License
55
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Use settings classes
#48
Mews
closed
5 days ago
0
Deleted src/tiny_web_crawler/spider.py
#47
Mews
closed
6 days ago
0
Use one or more `Options` classes
#46
Mews
closed
5 days ago
1
Respect robots txt
#45
Mews
closed
6 days ago
0
Make `Spider` importable from root module
#44
Mews
closed
1 week ago
0
Feature: Logging
#43
Mews
closed
1 week ago
2
Respect robots.txt when crawling when set as True
#42
indrajithi
closed
6 days ago
1
Can't run tests on local machine
#41
Mews
closed
1 week ago
1
Add `internal_links_only` and `external_links_only` options
#40
Mews
closed
1 week ago
0
Implement a retry mechanism for transient errors
#39
indrajithi
opened
1 week ago
0
Implement logging
#38
indrajithi
closed
1 week ago
1
Crawl depth per domain
#37
indrajithi
opened
1 week ago
0
Refactored spider class into modules
#34
indrajithi
closed
1 week ago
0
Removed unused `main` function
#33
Mews
closed
1 week ago
2
Coverage report in readme
#32
Mews
closed
1 week ago
3
Fix small typo in verbose print statement
#31
Mews
closed
1 week ago
3
Display coverage percentage in readme
#30
Mews
closed
1 week ago
2
Increased test coverage
#29
Mews
closed
1 week ago
2
Test coverage above 80%
#28
Mews
closed
1 week ago
0
What is the `main` function for?
#27
Mews
closed
1 week ago
2
Make `Spider` importable from main module
#26
Mews
closed
1 week ago
1
Set tests stage to push
#25
Mews
closed
1 week ago
0
First Major Release v1.0.0
#24
indrajithi
opened
1 week ago
3
`poetry install --with dev` doesn't install pre-commit hooks
#23
Mews
opened
1 week ago
6
Add mypy and pytest to pre-commit
#22
Mews
closed
1 week ago
0
Fix `crawl_result` type hint
#21
Mews
opened
1 week ago
1
Add mypy to pre-commit hooks
#20
Mews
closed
1 week ago
6
Added option to include page body in crawl results
#19
Mews
closed
1 week ago
7
Docs: Auto generate documentation
#18
indrajithi
opened
1 week ago
0
Housekeeping: Refactor the code base to a more Modular and Extensible Architecture
#17
indrajithi
closed
1 week ago
0
Url regex matching
#16
Mews
closed
1 week ago
2
Feature/refactor spider class
#15
indrajithi
closed
1 week ago
0
Add a flag to crawl only the root website
#14
devavinothm
closed
1 week ago
2
Feature: Support for regular expression pattern for url crawling
#13
indrajithi
closed
1 week ago
5
Feature: Add a feature to only crawl the given list of urls
#12
indrajithi
opened
1 week ago
5
Feature: Support flag to crawl only the root website. Do not hop to external links
#11
indrajithi
closed
1 week ago
10
Feature: Support for crawling dynamic javascript heavy site
#10
indrajithi
opened
1 week ago
5
Housekeeping: Refactor the Spider class to reduce the max args. Use a dataclass
#9
indrajithi
closed
1 week ago
1
Feature: Add option to return the crawled website body in the response
#8
indrajithi
closed
1 week ago
3
add support for concurrent workers, custom delay and optional verbose
#7
indrajithi
closed
1 week ago
0
use pottery to publish
#6
indrajithi
closed
1 week ago
0
Feature/test workflow
#5
indrajithi
closed
1 week ago
0
python publish workflow
#4
indrajithi
closed
1 week ago
0
Create python-publish.yml
#3
indrajithi
closed
1 week ago
0
Update pylint.yml
#2
indrajithi
closed
1 week ago
0
Feature: Basic features and package the project
#1
indrajithi
closed
1 week ago
0