issues
search
internetarchive
/
Zeno
State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
83
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix item panic when item fail at preprocessing
#172
CorentinB
closed
1 day ago
0
Add log config support
#171
equals215
closed
1 day ago
1
Add pipeline control mechanism to instantiate, pause/resume and stop
#170
equals215
closed
1 day ago
3
Handle non-UTF8 HTML pages
#169
CorentinB
opened
6 days ago
0
Report local crawls on HQ delete
#168
CorentinB
opened
6 days ago
0
Handle assets redirects
#167
CorentinB
opened
6 days ago
0
Zeno v2
#166
equals215
opened
1 week ago
0
Normalize URLs in Preprocessor
#165
willmhowes
opened
1 week ago
2
Handle infinite redirects
#164
CorentinB
opened
1 week ago
0
Accept a configuration file passed at runtime
#163
willmhowes
opened
1 week ago
0
End to End Testing
#162
willmhowes
opened
1 week ago
0
PDF extractor
#161
CorentinB
opened
1 week ago
0
Ebook extractor
#160
CorentinB
opened
1 week ago
0
RSS extractor
#159
CorentinB
opened
1 week ago
0
Add --hq-batch-concurrency
#158
CorentinB
closed
2 weeks ago
0
Better xml parsing
#157
yzqzss
closed
2 weeks ago
0
Update to latest version of gocrawlhq
#156
NGTmeaty
closed
3 weeks ago
0
Implement XML parsing using stdlib
#155
HarshNarayanJha
closed
2 weeks ago
4
Add Amazon S3 extractor
#153
CorentinB
closed
1 month ago
0
Add support for ina.fr videos
#152
CorentinB
closed
1 month ago
0
Reevaluate host-based concurrent crawling limits
#151
NGTmeaty
opened
1 month ago
0
Fix Reddit support
#150
CorentinB
closed
2 months ago
0
Add --disable-ipv4 & --disable-ipv6
#149
CorentinB
closed
2 months ago
0
URL fixes
#148
NGTmeaty
closed
2 months ago
0
Invalid logging level
#147
CorentinB
opened
2 months ago
0
Zeno doesn't start with get list and an empty line in list
#146
CorentinB
opened
2 months ago
2
Fix: cookie named incorrectly.
#145
NGTmeaty
closed
2 months ago
0
fix: incorrect crawl speed unpausing behavior
#144
NGTmeaty
closed
2 months ago
0
Add some live logging of the HQ consumer behavior
#143
CorentinB
closed
2 months ago
0
fix: `HQSeencheckURL()` incorrectly distinguishes between new and old URL
#142
yzqzss
closed
2 months ago
0
chore: compile regexs only once
#141
yzqzss
closed
3 months ago
1
Fix worker state panic when current item or URL is nil
#140
CorentinB
closed
3 months ago
0
Fix seencheck re-implementation with current queue
#139
CorentinB
closed
3 months ago
0
Add XML extractor tests
#138
CorentinB
closed
3 months ago
0
Add custom code for Reddit archiving
#137
CorentinB
closed
3 months ago
0
Allow specifying proxies per TLD
#136
CorentinB
opened
3 months ago
2
Allow specifying an URL for `get list` to use a remote list
#135
CorentinB
opened
3 months ago
0
fix: recursion present in Telegram URLs
#134
NGTmeaty
closed
3 months ago
0
Add job and WARC prefix to pyroscope tags for better search
#133
NGTmeaty
closed
3 months ago
0
Panic on /workers access
#132
CorentinB
closed
3 months ago
5
Verify & test our XML extraction in the context of sitemaps
#131
CorentinB
closed
2 weeks ago
0
fix: ensure there is no infinite recursion of URLs
#130
NGTmeaty
closed
3 months ago
0
Add additional tests and validate current URL behavior
#129
NGTmeaty
closed
3 months ago
0
Fix the handover bypass when seeds are loaded from list and handover is disabled
#128
equals215
closed
3 months ago
0
Add Pyroscope profiling support
#127
NGTmeaty
closed
3 months ago
0
Add proper YouTube archiving via YT-DLP
#126
CorentinB
closed
2 months ago
0
Correct seedList enqueuing process
#125
equals215
closed
3 months ago
0
Ingest seeds before starting workers
#124
CorentinB
closed
3 months ago
0
Send on closed channel panic
#123
CorentinB
opened
3 months ago
2
[STALE] Split Zeno in smaller packages with a better structure
#122
equals215
closed
1 week ago
3
Next