lucaswerkmeister / tool-lexeme-forms

Wikidata tool to create lexemes with pre-populated forms (e. g. declensions or conjugations)
https://lexeme-forms.toolforge.org/
Other
11 stars 7 forks source link

Use Wikifunctions Beta for generating default forms #140

Open vrandezo opened 1 year ago

vrandezo commented 1 year ago

Proposal / Feature request

Regular forms can often be generated by a function. Wikifunctions Beta already has a number functions that in most cases generate the right form for a given lemma. The suggestion is:

  1. to add an optional field on each form that holds the ZID of the function to call to generate a specific form from the first form (i.e. the lemma)
  2. once the first form is entered, either automatically or through an action call the given function and fill in the forms based on the generated results

A warning that the forms should still be checked would be good, as these functions are rarely always right.

A JavaScript function that runs in the browser and can call a function on Wikifunctions can be found here:

https://github.com/vrandezo/formcheck/blob/e0aa0e6bca94f35f66877e485df1aea6d529ecbd/index.html#L600

(I wanted to code this up myself and send a pull request, alas, I couldn't get lexeme-forms to run. Sorry)

lucaswerkmeister commented 1 year ago

Wikifunctions Beta already has a number functions that in most cases generate the right form for a given lemma.

It would be helpful to have some examples for these.

to add an optional field on each form that holds the ZID of the function to call to generate a specific form from the first form (i.e. the lemma)

Would this be part of the templates (hard-coded for each template) or part of the user interface (users can specify arbitrary ZIDs)?

vrandezo commented 1 year ago

Here are a few examples:

The functions would be hardcoded for each template and each form. The end user would not need to see ZIDs nor would they need to select functions. They are created once when the template is created and stored with the template.

I made some slideware to show what I mean: https://docs.google.com/presentation/d/1xSZNm4yaoICtPKX6l_mQwfUodpXTSepdpf8GKtapbDo/edit#slide=id.g1b9b89ec891_0_14

vrandezo commented 1 year ago

Here's an example page of how the templates could look like for German female substantives:

https://www.wikidata.org/wiki/Wikidata:Wikidata_Lexeme_Forms/German_with_ZID

vrandezo commented 1 year ago

Does this sound like something you'd want to have in the tool?

lucaswerkmeister commented 1 year ago

Maybe, yes. But at least while it’s targeting the Beta cluster, it should probably be an opt-in setting, which means I should finish the language-preference branch first (which introduces settings for the tool in general).

lucaswerkmeister commented 11 months ago

I guess we can rethink this now that non-Beta Wikifunctions launched :D (and also allows anonymous function calls, because limiting the feature to users who have a certain user right would be a bit annoying)

So: optionally, a non-first¹ form in a template can specify a Wikifunction that’s called whenever the first form¹ changes, to generate the other forms?

I think I’d only want to do this in the JS frontend, so that users can see (and potentially correct) the output before submitting it to the server; we should probably show some kind of loading indicator next to the first form¹ while making the function calls, so the user knows that they should wait instead of starting to enter the next forms.

(Actually, users are probably already waiting for a little bit to see if the duplicate warning shows up or not. I wonder if the duplicate warning should also come with a loading indicator, so that you have an indication when it finished looking for duplicates and none were found :thinking:)

¹ note that there’s also a branch (@nikkiwd did you ever get around to testing it?) to make the “lemma” form not necessarily be the first form, in which case the function input should probably also be the lemma, not necessarily the first form?

lucaswerkmeister commented 11 months ago

I wonder what’s better: making the function call API requests from JS or from Python?

I think I’ll try the Python approach first and see how bad it is to have to wait for all the evaluations to finish. (I can test it for one of the really long templates, like Czech adjectives – the functions can be fake.)

lucaswerkmeister commented 11 months ago

Pushed a proof-of-concept on the compute branch; I think I’ll iterate on it a bit more tomorrow.

The string identity function is nice for initial testing, but are there any functions on Wikifunctions yet that we could use for real templates?

(Also, what’s a good verb for this feature? I went with “compute” in the above commit, but that was just the first thing that came into my head.)

nikkiwd commented 11 months ago

(Actually, users are probably already waiting for a little bit to see if the duplicate warning shows up or not. I wonder if the duplicate warning should also come with a loading indicator, so that you have an indication when it finished looking for duplicates and none were found 🤔)

It's effectively instantaneous for me, so I would find a loading indicator more annoying than useful. If you really want to add something like that, I think it would make more sense to display the final state (no duplicates found, duplicates were found, or wasn't able to check for duplicates).

¹ note that there’s also a branch (@nikkiwd did you ever get around to testing it?) to make the “lemma” form not necessarily be the first form, in which case the function input should probably also be the lemma, not necessarily the first form?

I didn't :pensive: Maybe you can bug me about it again after WikidataCon? :sweat_smile:

By the way, a couple of years ago I made myself a little browser extension which generates forms - https://github.com/nikkiwd/extension-lexeme-forms. It's essentially the same concept as this, but with the functions in the extension. I never finished adding tests or cleaning up the uncommitted changes and local branches, so I'm still not totally happy with it, but it's been working well enough for my purposes.

The way I did it, I add buttons to the top right. Most of the time there's only one, but sometimes I have multiple for different types of declension. Clicking on a button takes the input of the first field (since that's what the Lexeme Forms tool currently uses as the lemma, or in edit mode it can also find the lemma from the heading) and calls the function associated with that button. The function returns an array of forms and those are used to fill in any gaps in the template (it doesn't overwrite anything). Then I can check it and make any changes that are needed before submitting it.

Here's what it looks like:

screenshot

For English, it says "guess forms" since there are plenty of irregular plurals it will get wrong, whereas for Esperanto it says "generate forms" because those forms are regular. (Probably a subtle difference but that's why the text isn't the same)

For German, since there are various ways to form plurals, the buttons are labelled based on the suffixes used for the genitive singular and nominative plural forms.

lucaswerkmeister commented 11 months ago

Hm, I see. I was wondering if we could use the Wikilambda function labels for the buttons (so that they can be translated into the user language), but looking at that screenshot I don’t think that would work out… so let’s make it part of the template (in the template language), I guess. Something like:

'template-name': {
    # ...
    'forms': [
        {
            'label': '...',
            'example': '...',
            'grammatical_features_item_ids': [...],
            'wikifunctions': {
                '-s/-n': 'Z12345',
                '-s/-s': 'Z123456',
                # ...
            },
        },
    ],
},

I also wonder whether we can already accommodate non-first forms as the lemma. Currently, you can use advanced mode to create a plurale tantum lexeme with the first plural form as the lemma. You could imagine having a function that generates, say, the genitive plural from the nominative plural – but this would presumably be a different function than the one that generates the genitive plural from the nominative singular. So the specification would be some sort of set of wikifunctions specifying the input forms for them… on second thought, let’s leave that out for now and start with the simpler case ^^

lucaswerkmeister commented 11 months ago

Alright, I added the functions for Croatian nouns, and doing all the function calls is quite a bit faster on Toolforge than on local development, so I think the reduction in runtime from doing all the calls in Python is real – I’ll keep this approach then. (Also, with the button as suggested by @nikkiwd rather than the automatic action I had in mind, I think it’s more acceptable to have the user wait until the results are there – they’re waiting for the result of an action they explicitly initiated.)

lucaswerkmeister commented 11 months ago

An experimental version of this is now deployed (opt in by creating this Wikifunctions user JS page); see the documentation, and the announcement for some next steps.