issues
search
gordicaleksa
/
Open-NLLB
Effort to open-source NLLB checkpoints.
MIT License
419
stars
37
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Deprecated flores200 links are replaced
#30
ezzini
opened
4 months ago
0
Provide context for input and output
#29
joiemoie
opened
9 months ago
0
Standard Moroccan Tamazight is mislabeled
#28
MedAymenF
opened
9 months ago
0
Creation of a small model file for a few languages
#27
dlippold
opened
11 months ago
0
Fix documentation broken link
#26
jordimas
opened
11 months ago
0
download allenai nllb mined bitext
#25
vienneraphael
opened
1 year ago
2
feat:Downloading mined bitext
#24
vienneraphael
closed
1 year ago
0
MinHash: benchmark memory, speed and accuracy with varying r and b
#23
vienneraphael
opened
1 year ago
1
Weird line length spikes in Serbian, Croatian, Bosnian (data analysis task)
#22
gordicaleksa
closed
1 year ago
4
[Future - outside current project scope] non-English LLMs (Serbian LLM, etc.)
#21
gordicaleksa
opened
1 year ago
0
[Future - outside current project scope] 7B lang-family-specific Open-NLLB checkpoint
#20
gordicaleksa
opened
1 year ago
0
[Data] Acquire additional high-quality (non-public) parallel corpora for HBS
#19
gordicaleksa
opened
1 year ago
0
[Modeling] Release a 615M English -> HBS Open-NLLB checkpoint
#18
gordicaleksa
opened
1 year ago
0
[Modeling] Release a 3.3B Open-NLLB checkpoint (~202 languages)
#17
gordicaleksa
opened
1 year ago
0
[Modeling] Release a 1.3B Slavic languages Open-NLLB checkpoint
#16
gordicaleksa
opened
1 year ago
0
[Modeling] Release a 615M HBS (Croatian, Bosnian, Serbian) Open-NLLB checkpoint
#15
gordicaleksa
opened
1 year ago
0
Get a compute grant
#14
gordicaleksa
opened
1 year ago
0
Estimate the necessary compute and number of GPUs for Open-NLLB effort
#13
gordicaleksa
opened
1 year ago
0
Understand how to do 4-stage curriculum learning from the paper
#12
gordicaleksa
opened
1 year ago
0
Setup a pipeline for mined data (use Allen AI's OSS dataset replication)
#11
gordicaleksa
opened
1 year ago
0
Obtain high quality Serbian parallel corpus (currently 0 support in our public bi-text)
#10
gordicaleksa
opened
1 year ago
0
fixed: type hinting issue in download_parallel_corpora.py
#9
lavaman131
closed
1 year ago
1
Choosing the LID model
#8
vienneraphael
opened
1 year ago
0
LID model peak probabilities
#7
vienneraphael
opened
1 year ago
0
Native language visualizations
#6
vienneraphael
opened
1 year ago
2
Spanish and Guarani filtering
#5
vienneraphael
closed
1 year ago
1
sub-batches creation
#4
vienneraphael
closed
1 year ago
1
Hydra pickle issue in generate_multi.py
#3
vienneraphael
opened
1 year ago
0
Reduce peak memory when using FSDP on 2+ GPUs
#2
gordicaleksa
opened
1 year ago
0
added Apex pre-install instructions
#1
lavaman131
closed
1 year ago
0