issues
search
nietaki
/
crawlie
A simple Elixir library for writing decently-performing crawlers with minimum effort.
MIT License
89
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Misc doc changes
#40
kianmeng
opened
3 years ago
1
Rate Limiting
#39
mikhailbot
closed
2 years ago
0
Modernize libraries and code to match current versions
#38
axelson
opened
5 years ago
1
Not compatible with the latest GenStage
#37
axelson
opened
5 years ago
0
minor edits
#36
RichMorin
opened
5 years ago
1
Making crawlie work with GenStage and Flow 0.12.0
#35
nietaki
closed
7 years ago
1
Update README.md
#34
loongmxbt
opened
7 years ago
1
Create some sort of CONTRIBUTING.md file
#33
nietaki
opened
7 years ago
0
Adding content-type(s) to Response struct. Closes #28.
#32
nietaki
closed
7 years ago
1
Make it possible to pass some information from the parent page
#31
nietaki
opened
7 years ago
0
Stats tracking
#30
nietaki
closed
7 years ago
4
Allowing skipping pages in ParserLogic.parse
#29
nietaki
closed
7 years ago
1
Add content_type and content_type_simple to the Response struct
#28
nietaki
closed
7 years ago
0
Allow for ParserLogic.parse to skip a page instead of just parsing
#27
nietaki
closed
7 years ago
1
Provide the option of tracking crawling statistics
#26
nietaki
closed
7 years ago
2
remove duplicate "crawling finished" debug messages
#25
nietaki
closed
7 years ago
1
moving the visited check to when pages are added. Closes #22
#24
nietaki
closed
7 years ago
1
Remove `initial` from the UrlManager State
#23
nietaki
closed
7 years ago
0
Do not add duplicate uris to the UrlManager State
#22
nietaki
closed
7 years ago
0
Moving to using URI.t in both Page and the HTTP Client
#21
nietaki
closed
7 years ago
1
Rename `extract_links` to `extract_uris`
#20
nietaki
closed
7 years ago
0
Move to using URI.t instead of strings for urls
#19
nietaki
closed
7 years ago
0
Updating GenStage and Flow to 0.11.x. Closes #17
#18
nietaki
closed
7 years ago
1
Update GenStage to 0.11
#17
nietaki
closed
7 years ago
0
Adding the Crawlie.Response struct
#16
nietaki
closed
7 years ago
2
Make Elixir Syntax Highlighting work
#15
tazsingh
closed
7 years ago
2
Add a simple usage example to the README
#14
nietaki
opened
7 years ago
0
Moving from heap to a priority queue for storing discovered pages.
#13
nietaki
closed
7 years ago
1
Have crawlie operate in the library's supervision tree instead of the caller's
#12
nietaki
closed
7 years ago
2
Tracking in-flight urls in UrlManager instead of relying on timeouts
#11
nietaki
closed
7 years ago
1
Links with depth over "max_depth" don't get sent to the Manager anymore.
#10
nietaki
closed
7 years ago
1
Replace the heap with a priority queue
#9
nietaki
closed
7 years ago
0
Tune the Flow parameters
#8
nietaki
closed
7 years ago
1
Signal completion of the fetches to the UrlManager instead of relying on timeouts to wrap it up.
#7
nietaki
closed
7 years ago
0
Pass more response data to the parser logic
#6
nietaki
closed
7 years ago
0
limiting urls to a domain
#5
nietaki
closed
7 years ago
2
Don't send links that are too deep back to the `UrlManager`
#4
nietaki
closed
7 years ago
3
Elliminating duplicate urls
#3
nietaki
closed
7 years ago
0
Fix the `:url_manager_timeout` logic
#2
nietaki
closed
7 years ago
2
Merge the option with defaults inside Crawlie.crawl
#1
nietaki
closed
7 years ago
0