issues
search
oscar-project
/
ungoliant
:spider: The pipeline for the OSCAR corpus
https://oscar-corpus.com
Apache License 2.0
162
stars
14
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[BUG] download malfunctioning
#132
kargaranamir
closed
7 months ago
1
[BUG] corrupt deflate stream
#131
kargaranamir
opened
7 months ago
0
[BUG] UnexpectedEof While running Ungoliant Pipeline
#130
nattkorat
closed
9 months ago
3
Updated dependencies
#129
pjox
opened
11 months ago
0
chore(deps): bump rustix from 0.37.19 to 0.37.25
#128
dependabot[bot]
opened
1 year ago
0
chore(deps): bump webpki from 0.22.0 to 0.22.2
#127
dependabot[bot]
opened
1 year ago
1
chore(deps): bump sha2 from 0.9.9 to 0.10.8
#126
dependabot[bot]
opened
1 year ago
1
lint fixes
#125
chris-ha458
opened
1 year ago
0
Optional corpus checksum + have language folders rather than flat files
#124
Uinelj
closed
1 year ago
1
[Feature request] Add larger timeouts between 503 retries for CC download
#123
Uinelj
opened
1 year ago
0
chore(deps): bump rustls-webpki from 0.100.1 to 0.100.2
#122
dependabot[bot]
opened
1 year ago
1
chore(deps): bump sha2 from 0.9.9 to 0.10.7
#121
dependabot[bot]
closed
1 year ago
2
chore(deps): bump oscar-io from 0.2.4 to 0.4.0
#120
dependabot[bot]
opened
1 year ago
0
chore(deps): bump env_logger from 0.8.4 to 0.9.3
#119
dependabot[bot]
opened
1 year ago
1
chore(deps): bump sha-1 from 0.9.8 to 0.10.1
#118
dependabot[bot]
opened
1 year ago
1
chore(deps): bump log from 0.4.18 to 0.4.20
#117
dependabot[bot]
opened
1 year ago
1
Add Dockerfile
#116
Uinelj
opened
1 year ago
2
ci: add cargo dist CI file and Cargo.toml configuration
#115
Uinelj
closed
1 year ago
1
Add splitting and compression at ungoliant runtime
#114
Uinelj
closed
1 year ago
1
chore(deps): bump tokio-util from 0.6.10 to 0.7.8
#113
dependabot[bot]
closed
1 year ago
1
chore(deps): bump tokio from 1.28.2 to 1.29.1
#112
dependabot[bot]
closed
1 year ago
1
WIP 3.0.0
#110
Uinelj
opened
1 year ago
1
Change sha256 to sha384
#109
chris-ha458
closed
1 year ago
2
[Feature request] Secure against length extension attacks
#108
chris-ha458
opened
1 year ago
4
Add information about other langID models
#107
Uinelj
closed
1 year ago
3
[Feature request] Document how to set fasttext model
#106
chris-ha458
opened
1 year ago
2
Download errors with Ungoliant
#105
pjox
closed
1 year ago
1
Fix spelling in readme
#104
Force1ess
closed
1 year ago
2
Fix language tags on NLLB model
#103
Uinelj
closed
1 year ago
1
chore(deps): bump csv from 1.2.0 to 1.2.2
#102
dependabot[bot]
closed
1 year ago
1
chore(deps): bump h2 from 0.3.15 to 0.3.17
#101
dependabot[bot]
closed
1 year ago
2
chore(deps): bump tokio-util from 0.6.10 to 0.7.7
#100
dependabot[bot]
closed
1 year ago
1
chore(deps): bump csv from 1.2.0 to 1.2.1
#99
dependabot[bot]
closed
1 year ago
1
chore(deps): bump futures-util from 0.3.26 to 0.3.28
#98
dependabot[bot]
closed
1 year ago
2
chore(deps): bump futures-core from 0.3.26 to 0.3.28
#97
dependabot[bot]
closed
1 year ago
1
chore(deps): bump tokio from 1.25.0 to 1.26.0
#96
dependabot[bot]
closed
1 year ago
2
fix "No language tag found" on NLLB tags
#95
Uinelj
closed
1 year ago
1
[BUG] Error when downloading full CC snapshot
#94
ngan-nt
closed
1 year ago
4
[BUG] Deduplication with Ungoliant
#93
Hammamwa47
opened
1 year ago
1
Automatically add binaries on releases
#92
Uinelj
opened
1 year ago
0
Improve coverage
#91
Uinelj
closed
1 year ago
1
fix tarpaulin CI
#90
Uinelj
closed
1 year ago
1
chore(deps): bump tokio from 1.17.0 to 1.18.5
#89
dependabot[bot]
closed
1 year ago
1
Option to keep documents that can't be identified
#88
Uinelj
opened
1 year ago
1
test: uncomment tests and add one more
#87
Uinelj
closed
1 year ago
0
Move TLSH out of annotations
#86
Uinelj
closed
1 year ago
1
Change `annotation` to `quality_warnings`
#85
Uinelj
closed
1 year ago
1
chore(deps): bump bumpalo from 3.9.1 to 3.12.0
#84
dependabot[bot]
closed
1 year ago
2
Move IO out of Ungoliant
#83
Uinelj
closed
1 year ago
1
refactor: remove old pipelines, old io code and old langtags
#82
Uinelj
closed
1 year ago
1
Next