Closed DeleMike closed 1 month ago
Hi @andrewtavis, I found a bug while trying to get total lexemes and I saw that it was connected to the language_metadata.json
file. I have worked on an initial fix (a PR), which I will soon drop so that you can see my reasoning on how I propose we fix it.
Can you assign this issue to me?
Terms
Behavior
Summary
When users provide a non-existent language or data type to the total command, the system incorrectly returns a number of lexemes. This leads to confusion and undermines the user experience.
Steps to Reproduce
Example
Initially when we run this command
scribe-data t -lang Latin
and we had:After we added some print statements, we see that the language_filter was not updating the
language
parameter hence giving a wrong result.You can see the same thing for French:
scribe-data t -lang French
, and we had:This shows inconsistent behaviour.
Expected Behavior
The command should validate the provided language and data type. If either does not exist, the system should gracefully return without executing the query and also suggest to the user what they can do to resolve it.
Root Cause
The current implementation lacks validation checks for the existence of input languages and data types in the metadata files. Specifically, the language_metadata.json file plays a crucial role in this issue. It serves as the authoritative source for valid languages and their corresponding QIDs. When a user inputs a language or data type that is not present in this file, the CLI does not recognize it as invalid and proceeds with the query. This oversight results in misleading output and a poor user experience
Proposed Solution
Related Issues
This issue is closely related to #295 as it has to do with CLI
Contribution
I would love to work and collaborate on implementing this improvement.