gordicaleksa Open-NLLB issues

gordicaleksa / Open-NLLB

Effort to open-source NLLB checkpoints.

MIT License

419 stars 37 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Deprecated flores200 links are replaced

#30 ezzini opened 4 months ago
0
Provide context for input and output

#29 joiemoie opened 9 months ago
0
Standard Moroccan Tamazight is mislabeled

#28 MedAymenF opened 9 months ago
0
Creation of a small model file for a few languages

#27 dlippold opened 11 months ago
0
Fix documentation broken link

#26 jordimas opened 11 months ago
0
download allenai nllb mined bitext

#25 vienneraphael opened 1 year ago
2
feat:Downloading mined bitext

#24 vienneraphael closed 1 year ago
0
MinHash: benchmark memory, speed and accuracy with varying r and b

#23 vienneraphael opened 1 year ago
1
Weird line length spikes in Serbian, Croatian, Bosnian (data analysis task)

#22 gordicaleksa closed 1 year ago
4
[Future - outside current project scope] non-English LLMs (Serbian LLM, etc.)

#21 gordicaleksa opened 1 year ago
0
[Future - outside current project scope] 7B lang-family-specific Open-NLLB checkpoint

#20 gordicaleksa opened 1 year ago
0
[Data] Acquire additional high-quality (non-public) parallel corpora for HBS

#19 gordicaleksa opened 1 year ago
0
[Modeling] Release a 615M English -> HBS Open-NLLB checkpoint

#18 gordicaleksa opened 1 year ago
0
[Modeling] Release a 3.3B Open-NLLB checkpoint (~202 languages)

#17 gordicaleksa opened 1 year ago
0
[Modeling] Release a 1.3B Slavic languages Open-NLLB checkpoint

#16 gordicaleksa opened 1 year ago
0
[Modeling] Release a 615M HBS (Croatian, Bosnian, Serbian) Open-NLLB checkpoint

#15 gordicaleksa opened 1 year ago
0
Get a compute grant

#14 gordicaleksa opened 1 year ago
0
Estimate the necessary compute and number of GPUs for Open-NLLB effort

#13 gordicaleksa opened 1 year ago
0
Understand how to do 4-stage curriculum learning from the paper

#12 gordicaleksa opened 1 year ago
0
Setup a pipeline for mined data (use Allen AI's OSS dataset replication)

#11 gordicaleksa opened 1 year ago
0
Obtain high quality Serbian parallel corpus (currently 0 support in our public bi-text)

#10 gordicaleksa opened 1 year ago
0
fixed: type hinting issue in download_parallel_corpora.py

#9 lavaman131 closed 1 year ago
1
Choosing the LID model

#8 vienneraphael opened 1 year ago
0
LID model peak probabilities

#7 vienneraphael opened 1 year ago
0
Native language visualizations

#6 vienneraphael opened 1 year ago
2
Spanish and Guarani filtering

#5 vienneraphael closed 1 year ago
1
sub-batches creation

#4 vienneraphael closed 1 year ago
1
Hydra pickle issue in generate_multi.py

#3 vienneraphael opened 1 year ago
0
Reduce peak memory when using FSDP on 2+ GPUs

#2 gordicaleksa opened 1 year ago
0
added Apex pre-install instructions

#1 lavaman131 closed 1 year ago
0