scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction

GNU General Public License v3.0

30 stars 69 forks source link

Add suggest an included language/data type on potential misspelling functionality to the CLI #341

Closed andrewtavis closed 1 month ago

andrewtavis commented 1 month ago

Terms

[X] I have searched open and closed feature requests
[X] I agree to follow Scribe-Data's Code of Conduct

Description

This issue would add functionality to the CLI that would check an incorrect language and return a suggestion to the user in case there's a similar language that Scribe-Data does support. I.e. if a user types:

scribe-data get -lang Englishh -dt nounss

... then they would get a suggestion for English and nouns respectively in the error message that says that the functionality for them is not included.

Contribution

Happy to support with this and review when a PR is open! 🚀

KesharwaniArpita commented 1 month ago

@andrewtavis can you assign me this issue?

andrewtavis commented 1 month ago

Sounds great, @KesharwaniArpita! Let us know if you need any support :)

andrewtavis commented 1 month ago

Hey @KesharwaniArpita 👋 Looking at it, this should definitely go within validate_language_and_data_type once #337 is merged in :) I'll try to get that in later on this evening!

KesharwaniArpita commented 1 month ago

Great!!!

KesharwaniArpita commented 1 month ago

Thanks for the heads up @andrewtavis . Though I am yet to look up some things. I'll try to resolve this issue soon. 😅

andrewtavis commented 1 month ago

337 has now been merged :) Pinging @catreedle about this as well. I think we should use `validate_language_and_data_type` for this and add functionality to it to check and make suggestions if something is close. Another thing, maybe `validate_language_and_data_type` should move to `cli_utils.py` and also be used for other commands? Feedback welcome!

DeleMike commented 1 month ago

337 has now been merged :) Pinging @catreedle about this as well. I think we should use validate_language_and_data_type for this and add functionality to it to check and make suggestions if something is close. Another thing, maybe validate_language_and_data_type should move to cli_utils.py and also be used for other commands? Feedback welcome!

Just a comment, @andrewtavis, I believe moving this to the cli_utils.py will keep things organized. It is a utility used by the CLI 😅 And you are right, we can extend the functionality of the validate_language_and_data_type function to perform this new operation.

KesharwaniArpita commented 1 month ago

I think we should use validate_language_and_data_type for this and add functionality to it to check and make suggestions if something is close.

Hi @andrewtavis , you were right about it. Currently, when users input an invalid language or data type in the CLI (e.g., scribe-data get -lang Englishh -dt nounss), the system raises a ValueError indicating that the total number of lexemes could not be found, without offering any suggestions for correction. We can modify the validate_language_and_data_type function to provide suggestions for the closest valid language or data type when an invalid input is detected by implementing a string similarity algorithm, such as difflib.get_close_matches, to determine the best match for the user’s input. If a close match is found, the error message will include the suggestion; otherwise, it will simply state that the input is invalid. This will improve user experience by offering more helpful feedback in case of errors. Does this make sense?

andrewtavis commented 1 month ago

Sounds good to me, @KesharwaniArpita! I'll take a look at it later :)

DeleMike commented 1 month ago

Hey @KesharwaniArpita , you can try the suggestions in this StackOverFlow post

andrewtavis commented 1 month ago

Closed by #344 :) Thanks for the work here @KesharwaniArpita and for the support @DeleMike!

KesharwaniArpita commented 1 month ago

Thank You Andrew

DeleMike commented 1 month ago

Glad to help! 🙌🏾