issues
search
openzim
/
warc2zim
Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
40
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix handling of <script> type in HTML documents
#292
benoit74
closed
3 weeks ago
2
mesquartierschinois: related articles / articles similaires are broken
#291
benoit74
opened
1 month ago
2
Add PDFs to suggestions list
#290
benoit74
opened
1 month ago
5
PDF content is not indexed in full text search
#289
benoit74
opened
1 month ago
2
Items which are expected to be automatically downloaded don't
#288
benoit74
opened
1 month ago
2
JS URL rewriting: take into account that url might be an object
#287
benoit74
closed
1 month ago
2
Support Python 3.12
#286
kelson42
closed
1 month ago
1
Fix fuzzy rule for Youtube thumbnails in JS
#285
benoit74
closed
1 month ago
3
Share FuzzyRules test data between JS and Python
#284
benoit74
opened
1 month ago
0
BCD tables in developer.mozilla.org are broken again
#283
benoit74
closed
3 weeks ago
4
Properly detect nested redirection loops
#282
benoit74
closed
1 month ago
1
Redirection loops are still conducting to dead loops
#281
benoit74
closed
1 month ago
1
Merge warc2zim2 branch into main
#280
benoit74
closed
1 month ago
0
Do not rewrite URLs composed of just a fragment
#279
benoit74
closed
1 month ago
1
Avoid and detect redirection loops
#278
benoit74
closed
1 month ago
1
Do not rewrite href containing only a fragment
#277
benoit74
closed
1 month ago
1
Dynamic URL rewriting: rewrite only when ZIM path exists
#276
benoit74
opened
1 month ago
4
Raise warnings when there is a conflict of http/https and/or ports and/or ...
#275
benoit74
opened
1 month ago
0
Fix youtube thumbnails
#274
benoit74
closed
1 month ago
4
Add failure thresholds for missing links
#273
benoit74
opened
1 month ago
1
Add support for web workers on crawled websites
#272
benoit74
closed
1 week ago
1
Add support for real fuzzy matching
#271
benoit74
opened
1 month ago
1
Add support for onxxx HTML events
#270
benoit74
closed
1 month ago
1
Document scraper capabilities and limitations
#269
benoit74
closed
1 month ago
0
Fix path typing in `WARCPayloadItem` and `Rewriter`
#268
benoit74
closed
1 month ago
0
ZIM Tags passed as single string with values separated by semicolons
#267
benoit74
closed
1 month ago
3
Validate ZIM metadata early
#266
benoit74
closed
1 month ago
0
Move from multiple `--tags` flags to one single `--tags` flag
#265
benoit74
closed
1 month ago
2
Fix inclusion of custom CSS in warc2zim2
#264
benoit74
closed
1 month ago
0
Custom CSS URL is not rewritten
#263
kelson42
closed
1 month ago
4
Youtube video not displaying placeholder image and not playing
#262
kelson42
closed
1 month ago
3
Unsupported upper-case chars in hostname
#261
kelson42
closed
1 month ago
1
Revisit decoding of documents from binary to string
#260
benoit74
closed
1 month ago
0
Document and test limitations around processing of HTML attributes
#259
benoit74
closed
1 month ago
0
Quote path passed to dynamic JS rewriting for prefix computation
#258
benoit74
closed
1 month ago
0
Dynamic URL rewriting in Wombat is not working on URLs unicode characters
#257
benoit74
closed
1 month ago
1
Log unexpected WARC record status codes as warnings
#256
benoit74
closed
1 month ago
0
Consider WARC record with status code 0 as normal
#255
benoit74
closed
1 month ago
1
Processing of WARC records with HTTP status code 0
#254
benoit74
closed
1 month ago
2
Charset declared in HTML documents are not rewritten
#253
benoit74
closed
2 weeks ago
0
Simplify debugging of WARC record processsing issues
#252
benoit74
closed
1 month ago
3
Properly notify JS module location when a base href is configured
#251
benoit74
closed
1 month ago
0
base href is not properly handled for notification of JS modules
#250
benoit74
closed
1 month ago
1
No more failure due to url rewriting issue
#249
benoit74
closed
1 month ago
1
Enhance ability to debug WARC item conversion issues
#248
benoit74
closed
1 month ago
1
Warc2zim2 is not sufficiently resilient to statical URL rewrite issues
#247
benoit74
closed
1 month ago
3
Warc2zim hanged forever
#246
benoit74
closed
1 month ago
2
Add a contrib script to test/debug HTML rewriting
#245
benoit74
closed
1 month ago
0
Handle HTTP return codes properly
#244
benoit74
closed
1 month ago
0
Handle base href
#243
benoit74
closed
1 month ago
1
Previous
Next