Closed stweil closed 7 months ago
@amitdo, @egorpugin, do you agree with the suggested renaming for all Tesseract repositories?
@AlexanderP, the planned renaming would also affect Debian and other distributions.
Yes, I agree.
I disagree doing it this way (as a rename). Users have known frk for a long time, despite its unfortunate naming. Since there is no actual Frankish model it could conflict with, in the interest of not breaking things for users, I suggest simply adding an alias instead of deleting the old name.
@bertsky, how would you add an alias?
I don't think that many users will experience a breakage by the renaming. German Fraktur is not relevant for most users of Tesseract, and those who use it either depend on tagged models which continue to provide frk
, or can fix their workflow by a trivial update.
how would you add an alias?
I guess the simplest way would be to symlink deu_latf to frk in the repo.
I agree, but maybe to set up a symlink first? Do we have similar version tags as in tesseract repo? For example, make a link until tess 6 release, remove symlink and rename after.
If this makes too much burden, just rename like this PR does. It is fine.
I guess the simplest way would be to symlink deu_latf to frk in the repo.
I agree, but maybe to set up a symlink first?
I created the symlink from the old frk to the new deu_latf for tessdata_fast and added a note there in the README. That should be sufficient for distributions and typical users who were always encouraged to use tessdata_fast.
Advanced users who want to run training won't have big problems with replacing frk by deu_latf.
I created the symlink from the old frk to the new deu_latf for tessdata_fast and added a note there in the README. That should be sufficient for distributions and typical users who were always encouraged to use tessdata_fast.
I cannot see frk
anymore. And the branch of this PR is already gone!
What you describe is the wrong direction of the link. I wrote _from deulatf to frk because that's how the old URLs would still work. With the symlink, you can browse on the Github UI and reference the file in a checkout, but not download directly.
I know that you suggested a symbolic link in the different direction, but we want to promote the new name as the standard, not the old one.
I know that you suggested a symbolic link in the different direction, but we want to promote the new name as the standard, not the old one.
The whole point of having the symlink is to keep the old URLs working. It's not about promoting anything. In your direction, there is no point of having the symlink at all.
Downloading from the main branch is never a good idea unless you are prepared to get different content or changing URLs. Use a tagged release or a branch, for example 4.1.0. https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/4.1.0/frk.traineddata still works. If necessary, we can add more tags or branches.
Information on different distributions:
@stweil what made you think that all model names in Tesseract must conform to ISO 639-3 in the first place?
What about ita_old, spa_old, kat_old, chi_tra, chi_sim, chi_sim_vert, jpn_vert, deu_frak, dan_frak and so forth?
IMO all the old names should at least be kept for backwards compatibility.
See related discussion https://github.com/tesseract-ocr/tesseract/issues/4201.
I think we should rename frk to deu_latf not only here, but also in all other Tesseract repositories (langdata, tessdata, tessdata_best, tessdata_fast, tessdoc) because "frk" was never an ISO name.