nietaki crawlie issues - Githubissues

nietaki / crawlie

A simple Elixir library for writing decently-performing crawlers with minimum effort.

MIT License

89 stars 11 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Misc doc changes

#40 kianmeng opened 3 years ago
1
Rate Limiting

#39 mikhailbot closed 2 years ago
0
Modernize libraries and code to match current versions

#38 axelson opened 5 years ago
1
Not compatible with the latest GenStage

#37 axelson opened 5 years ago
0
minor edits

#36 RichMorin opened 5 years ago
1
Making crawlie work with GenStage and Flow 0.12.0

#35 nietaki closed 7 years ago
1
Update README.md

#34 loongmxbt opened 7 years ago
1
Create some sort of CONTRIBUTING.md file

#33 nietaki opened 7 years ago
0
Adding content-type(s) to Response struct. Closes #28.

#32 nietaki closed 7 years ago
1
Make it possible to pass some information from the parent page

#31 nietaki opened 7 years ago
0
Stats tracking

#30 nietaki closed 7 years ago
4
Allowing skipping pages in ParserLogic.parse

#29 nietaki closed 7 years ago
1
Add content_type and content_type_simple to the Response struct

#28 nietaki closed 7 years ago
0
Allow for ParserLogic.parse to skip a page instead of just parsing

#27 nietaki closed 7 years ago
1
Provide the option of tracking crawling statistics

#26 nietaki closed 7 years ago
2
remove duplicate "crawling finished" debug messages

#25 nietaki closed 7 years ago
1
moving the visited check to when pages are added. Closes #22

#24 nietaki closed 7 years ago
1
Remove `initial` from the UrlManager State

#23 nietaki closed 7 years ago
0
Do not add duplicate uris to the UrlManager State

#22 nietaki closed 7 years ago
0
Moving to using URI.t in both Page and the HTTP Client

#21 nietaki closed 7 years ago
1
Rename `extract_links` to `extract_uris`

#20 nietaki closed 7 years ago
0
Move to using URI.t instead of strings for urls

#19 nietaki closed 7 years ago
0
Updating GenStage and Flow to 0.11.x. Closes #17

#18 nietaki closed 7 years ago
1
Update GenStage to 0.11

#17 nietaki closed 7 years ago
0
Adding the Crawlie.Response struct

#16 nietaki closed 7 years ago
2
Make Elixir Syntax Highlighting work

#15 tazsingh closed 7 years ago
2
Add a simple usage example to the README

#14 nietaki opened 7 years ago
0
Moving from heap to a priority queue for storing discovered pages.

#13 nietaki closed 7 years ago
1
Have crawlie operate in the library's supervision tree instead of the caller's

#12 nietaki closed 7 years ago
2
Tracking in-flight urls in UrlManager instead of relying on timeouts

#11 nietaki closed 7 years ago
1
Links with depth over "max_depth" don't get sent to the Manager anymore.

#10 nietaki closed 7 years ago
1
Replace the heap with a priority queue

#9 nietaki closed 7 years ago
0
Tune the Flow parameters

#8 nietaki closed 7 years ago
1
Signal completion of the fetches to the UrlManager instead of relying on timeouts to wrap it up.

#7 nietaki closed 7 years ago
0
Pass more response data to the parser logic

#6 nietaki closed 7 years ago
0
limiting urls to a domain

#5 nietaki closed 7 years ago
2
Don't send links that are too deep back to the `UrlManager`

#4 nietaki closed 7 years ago
3
Elliminating duplicate urls

#3 nietaki closed 7 years ago
0
Fix the `:url_manager_timeout` logic

#2 nietaki closed 7 years ago
2
Merge the option with defaults inside Crawlie.crawl

#1 nietaki closed 7 years ago
0