issues
search
hplt-project
/
OpusCleaner
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
https://pypi.org/project/opuscleaner/
46
stars
13
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
bicleaner-ai?
#112
kpu
opened
1 year ago
0
A simple length ratio filter shouldn't require manually updating pyhash submodules
#111
kpu
closed
1 year ago
2
shuffle data
#110
kpu
opened
1 year ago
2
My browser is going slow
#109
kpu
opened
1 year ago
1
Why do I select cleanliness when I'm not looking at the data?
#108
kpu
closed
1 year ago
1
Add notes
#107
kpu
opened
1 year ago
0
opuscleaner-clean returns status code 0 if one of the processes fail
#106
eu9ene
closed
1 year ago
0
How to use opuscleaner-clean with stdin?
#105
eu9ene
closed
1 year ago
1
Feature request: Do not rerun the whole processing pipeline when adding filters in the web frontend
#104
varisd
closed
11 months ago
1
Any plans to share the filter configs?
#103
eu9ene
opened
1 year ago
5
Setting languages for individual rules is redundant
#102
eu9ene
closed
1 year ago
2
Support a hierarchy of filters
#101
eu9ene
opened
1 year ago
1
Fix mismatching of sentence final punctuation
#100
XapaJIaMnu
closed
11 months ago
1
langid vs fasttext_filter
#99
kpu
opened
1 year ago
1
How does the num_mismatch filter tokenize?
#98
kpu
closed
11 months ago
2
How do I say this dataset is so garbage it shouldn't be used?
#97
kpu
opened
1 year ago
2
What does the pie "show dataset statistics" do?
#96
kpu
opened
1 year ago
1
Clicking "Datalor" logo from "import dataset page" results in empty folder even though some have been downloaded
#95
kpu
opened
1 year ago
0
Color coding download states
#94
jindrahelcl
closed
1 year ago
0
Download all of them!
#93
kpu
closed
1 year ago
0
Modify URL to reflect language selection
#92
kpu
closed
1 year ago
0
No way to retry failed download
#91
kpu
closed
1 year ago
0
Error when filters is an empty array
#90
jindrahelcl
closed
1 year ago
2
num_mismatch filter does not recognize equivalent numbers
#89
jindrahelcl
closed
1 year ago
0
Downloads keep running after interrupt
#88
jindrahelcl
closed
1 year ago
2
Move filters to their own packages
#87
jelmervdl
opened
1 year ago
0
Monolingual dataset support
#86
jelmervdl
closed
1 year ago
0
Add support for monolingual datasets
#85
jelmervdl
closed
1 year ago
0
CLI to deal with data categories
#84
jelmervdl
opened
1 year ago
2
Boolean filter properties are always `true`
#83
jelmervdl
closed
1 year ago
2
Deal better with datasets that use `\r\n` as line separator
#82
jelmervdl
closed
1 year ago
0
OpusCleaner leaves `\r` in output?
#81
jelmervdl
closed
1 year ago
4
Missing minor installation instructions in README
#80
tollefj
closed
1 year ago
0
OpusCleaner get stuck on ASGI error when adding filters while sample is still loading
#79
jelmervdl
closed
11 months ago
1
Interface sorts language columns on full name, backend on language code
#78
jelmervdl
closed
1 year ago
1
Undo/redo
#77
jelmervdl
opened
1 year ago
0
Store download urls in opuscleaner configuration files
#76
jelmervdl
opened
1 year ago
0
Add instructions for running it without installing it
#75
XapaJIaMnu
opened
1 year ago
1
Add tests
#74
jelmervdl
opened
1 year ago
0
Opuscleaner rebrand & packaging
#73
jelmervdl
closed
1 year ago
0
Corpus analytics
#72
jelmervdl
opened
1 year ago
0
Add a deduplicator
#71
XapaJIaMnu
closed
1 year ago
1
Special quote filter for CCMatrix and CCAligned
#70
XapaJIaMnu
opened
1 year ago
0
[Discussion] Filter that checks for numerical sequences?
#69
XapaJIaMnu
closed
1 year ago
3
Do not assume stderr output is always an error
#68
XapaJIaMnu
closed
1 year ago
2
Rebrand to OpusCleaner (written `opuscleaner` in code)
#67
jelmervdl
closed
1 year ago
0
Docker container
#66
jelmervdl
opened
1 year ago
0
run.py fails on with exported json when unpacking filenames
#65
XapaJIaMnu
closed
1 year ago
3
`net::ERR_INCOMPLETE_CHUNKED_ENCODING` in Chrome
#64
jelmervdl
closed
1 year ago
0
Dataset categories
#63
XapaJIaMnu
opened
1 year ago
0
Previous
Next