oscar-project / ungoliant

:spider: The pipeline for the OSCAR corpus
https://oscar-corpus.com
Apache License 2.0
162 stars 14 forks source link

Fix language tags on NLLB model #103

Closed Uinelj closed 1 year ago

Uinelj commented 1 year ago

NLLB tags contain underscores (such as fra_Latn) which breaks oxilangtag. This PR adds tag conversion from fra_Latn to fra-Latn.

codecov[bot] commented 1 year ago

Codecov Report

Merging #103 (8b25ab5) into main (dd71d13) will increase coverage by 0.71%. The diff coverage is 83.33%.

@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
+ Coverage   47.79%   48.50%   +0.71%     
==========================================
  Files          22       22              
  Lines        1132     1138       +6     
==========================================
+ Hits          541      552      +11     
+ Misses        591      586       -5     
Impacted Files Coverage Δ
src/identifiers/model.rs 26.41% <0.00%> (ø)
src/identifiers/tag_convert.rs 98.81% <100.00%> (+0.02%) :arrow_up:

... and 2 files with indirect coverage changes