issues
search
hplt-project
/
OpusCleaner
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
https://pypi.org/project/opuscleaner/
46
stars
13
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Move sample command to opuscleaner-datasets
#162
jelmervdl
closed
1 month ago
0
Build frontend on install
#161
jelmervdl
closed
1 month ago
0
Build alphabet support from CLDR data
#160
gregtatum
opened
2 months ago
0
Possible unpredicted behaviour
#159
rggdmonk
opened
2 months ago
0
Normalize text before running alphabet regexes
#158
gregtatum
closed
2 months ago
0
Add alphabet support for bs, id, sr, tr, vi
#157
gregtatum
closed
2 months ago
0
Laser filter error: ValueError: could not convert string to float: b''
#156
eu9ene
opened
6 months ago
1
ValueError: large.bin has wrong file format!
#155
eu9ene
opened
6 months ago
0
Support LASER 2/3
#154
eu9ene
opened
6 months ago
0
SSL cert issue on MacOS Sonoma
#153
marcelo-at-medialocate
opened
6 months ago
0
opuscleaner-clean hangs on error
#152
eu9ene
opened
8 months ago
0
One universal filter configuration to run on all datasets sequentially
#151
mzeidhassan
opened
8 months ago
2
opuscleaner-clean fails
#150
bhaddow
closed
1 month ago
7
Using the fix-quotes filter, and viewing the changes, makes it look as though the file has been replaced.
#149
bhaddow
opened
9 months ago
1
Automatically derive filters based on a clean sample provded by the user.
#148
PinzhenChen
opened
9 months ago
5
add a separater between selected fillters and the filter pool
#147
PinzhenChen
opened
9 months ago
0
Should OpusCleaner have the notion of a "project"?
#146
bhaddow
opened
9 months ago
3
Fails to install requirements-all.txt on Python 3.10
#145
bhaddow
opened
9 months ago
7
There should be sensible defaults for filters wherever possible
#144
bhaddow
opened
9 months ago
1
Using the "detokenizer" filter rule gives an error
#143
bhaddow
opened
9 months ago
1
The configuration of data searching and downloading directories is not linked
#142
bhaddow
opened
9 months ago
1
Support monolingual datasets
#141
jelmervdl
opened
9 months ago
0
PyPi releases
#140
eu9ene
closed
9 months ago
3
Monolingual download not working
#139
lukasweymann
closed
9 months ago
5
Using OpusCleaner with local dataset
#138
mzeidhassan
closed
9 months ago
4
Fixing remove_empty_lines
#137
jindrahelcl
closed
9 months ago
0
tiny refactor in opuscleaner.datasets
#136
jindrahelcl
closed
9 months ago
0
Show the download percentage in the UI
#135
gregtatum
opened
11 months ago
0
Cutting off internet during download leaves the download in a broken state.
#134
gregtatum
opened
11 months ago
2
Normalize language tags
#133
gregtatum
opened
11 months ago
0
num_mismatch discards some useful entries
#132
gregtatum
opened
11 months ago
5
delete redundant def, import instead
#131
jindrahelcl
closed
11 months ago
0
Refactor filters as transfomers & scorers
#130
jelmervdl
opened
11 months ago
0
Configure the diff view to select diffs between different steps
#129
jindrahelcl
opened
11 months ago
0
Whitespace normalization filter
#128
jindrahelcl
closed
11 months ago
10
Tooltip that says which filter did it
#127
jindrahelcl
opened
11 months ago
1
Revert "Update remove_empty_lines.json"
#126
jelmervdl
closed
11 months ago
5
Update remove_empty_lines.json
#125
jindrahelcl
closed
11 months ago
0
add version somewhere in the frontend
#124
jindrahelcl
closed
11 months ago
0
Improve num_mismatch filter
#123
jelmervdl
closed
11 months ago
0
num_mismatch fails on CCMatrix
#122
eu9ene
closed
6 months ago
8
Show overlap scores when downloading datasets
#121
jelmervdl
opened
1 year ago
0
Add missing docs & types and `--time` support
#120
jelmervdl
closed
1 year ago
0
Add license
#119
kpu
opened
1 year ago
0
Extract all (two…) files from the zip archive in parallel
#118
jelmervdl
closed
1 year ago
0
Cancelable parallel
#117
jelmervdl
closed
1 year ago
1
opuscleaner-clean hangs on CCMatrix
#116
eu9ene
closed
1 year ago
6
FileNotFoundError: [WinError 2] The system cannot find the file specified
#115
tomsbergmanis
closed
1 year ago
3
Update UI
#114
jelmervdl
opened
1 year ago
0
LASER threshold broken?
#113
kpu
closed
1 year ago
1
Next