Open gezakerecsenyi opened 11 months ago
What a PR! Thank you so much for your work on this, it's really quite comprehensive. Apologies for my delay in responding to this. I'm very happy anyone would be willing to dig through and update my code so much :)
As you mention it's a lot of changes in one big chunk, so it's a little hard to review, but glancing through things generally look better. I appreciate that you extracted parameters into their own file.
My only concern is that there's something nice about leaving the original glish that closely matches the video. I know it already doesn't quite match exactly in some places, but it might be confusing if all the words are regenerated entirely. Even with all these improvements, I will say the example sentence is still pretty similarly tough to understand, and if the output is suddenly totally different, it's hard to use this tool consistently.
I might need to add some kind of versioning system so that baseline improvements can be made while still being possible to see the "original" Glish.
This fork is quite messy: all of my commits had to be rebased into one as I was hitting GitHub's 100MB file limit with
outputs/random_generated_syllables_with_variations.json
and had to convince it that the file never existed... I would highly suggest re-generating everything yourself to prevent any problems arising from the uploading of generated files to GitHub.I'll do my best to explain my changes in this PR instead:
types.ts
for consistency. Any instances of strings representing IPA are instead given theIPA
type, which is synonymous tostring
, but glosses differently in an IDE, which I found to greatly reduce confusion as someone starting to work on the codebase anew. A few small variable name inconsistencies and spelling mistakes have additionally been corrected.package.json
now containsscripts
to run the syllable and word generation.parameters.ts
. This makes experimenting with options much easier.getRandomSyllableFromPalette
has been improved to optionally palette-cleansing to prevent character reuse (similarly togetRandomSyllable
, though a little more crude to avoid massive performance hits), as well as order enforcement. These two options are both attempted inmain.ts
before resorting to default (i.e. prior)getRandomSyllableFromPalette
behaviour.scoreForRandomSyllable
has also been improved to provide bonusses for phonemes being roughly corresponding positions and ordering in the original word, as well as providing less severe punishment for substitution of "similar" phonemes (e.g.,best
->pest
will be graded higher thanbest
->mest
). This is defined using a lookup table inparameters.ts
and is primarily based on voicedness pairs or other taxonomic similarities (e.g. nasality ofm
andn
), though (especially for vowels) may be somewhat subjective - as such, I've attempted to keep to the heuristic of using North-American pronunciation as an indicator.findVariants
has been greatly reworked and replaced withfindEnglishVariants
. In terms of identifying endings, it seems to be far more efficient to test the raw English than the IPA, and, since this is information that we have at all points thatfindVariants
is called, this should be okay, particularly when combined with more sensitive detection formulae (e.g. testingcreate
for^ing
yieldscreating
instead of*createing
). As such, I've also revamped the specification system for variants, making it easier to add in a whole host of new ones, including comparatives, superlatives, adverbs, and, through some syntactic specification,re*
,dis*
,un*
,pre*
,post*
,in*
, andnon*
all can act as prefix matchers.testing
->tengst
,Testing
->Tengst
,testinG
->Tengst
,TESTING
->TENGST
). Numbers are left as-is if a word is solely digits, or treated as part of a word if combined with alphabetic characters - but never entirely deleted, as before. Dodgy spacing when using the "Copy monosyllabic" feature of the UI has also been rectified.Test cases
Original
Old Glish
New Glish