This repository contains data and tools used to build dictionaries for Portuguese.
The owner, maintainer, and main dev for this repository is @p-goulart. The shell and perl components may be better explained by @jaumeortola, though.
For contribution guidelines, see CONTRIBUTING.md.
Portuguese has two separate sets of dictionaries, with separate source data and scripts to handle them:
MORFOLOGIK_RULE_PT_*
spelling rules for
all Portuguese varieties;For more in-depth information about each type, check their respective READMEs.
For Legacy Reasons™, this repository is structured as follows:
The folder dict_tools is a Git submodule. In order for it to work as a Python package, you must define
it as a sources root. In PyCharm, you can do this by right-clicking the folder and selecting
Mark Directory as > Sources Root
.
The release process is automated upon each new tag pushed to this repo.
As of January 2024, the release only goes as far as deploying the binaries to staging repositories on SonaType.
In order to actually release the new version, you must log in to SonaType, navigate to the staging repositories, select
the repository you'd like to deploy, and click Release
. This does mean that, for now, only LT members with access
to LT's Sonatype account can actually release new versions.
Soon, this will be no longer be the case, and they will be released automatically whenever a new tag is pushed to
main
. Since there are restrictions on who pushes to main
, this should be safe.
These dictionaries use semantic versioning, as they are essentially libraries that can be declared as dependencies by LT.
As of May 2024 (with release v1.0.0
), we are using the following versioning scheme:
Note that, in order for LT to actually use the newly released version, you'll need to update the version of the
portuguese-pos-dict
dependency in LT's main pom.xml
file.
If you are not a maintainer but you want to contribute words to the dictionaries, it should be relatively simple. There are many steps, but they are all quite straightforward:
in the main LanguageTool repo, branch out from master
and push it (even if it doesn't have any changes yet);
MorfologikPortugueseSpellerRuleTest
class;copy the branch name and add that to the LT_BRANCH
variable in the build.yml
workflow file in this repo:
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11"]
env:
LT_BRANCH: "pt/dict/v015" ## <- here, set this to the branch you created
branch out of main
in this repo, and make your changes;
commit and push your changes, and create a pull request;
the test workflow will run, testing your changes against the LT branch you specified;
if the tests pass and the PR is approved, merge it;
create a new tag for the merged commit, and push it to the repo (the tag will be the new dictionary version, so make sure it adhered to our versioning scheme!);
the release workflow will run, deploying the new version to Sonatype;
log in to Sonatype and release the new version (this part might be automated away in the future);
wait 10-20 minutes — it takes a while for the new version to be propagated, so it may not be immediately available to LT;
update the portuguese-pos-dict
dependency in LT's pom.xml
file to the new version;
push the changes to LT's repo, and wait for the CI to run all the tests;
if everything is green, merge the changes to main
in LT's repo; the new version of the dictionaries
should now be available to all LT users!