hrishikeshrt / PyCDSL

Python Interface to Cologne Digital Sanskrit Lexicon (CDSL)
https://pypi.org/project/PyCDSL/
Other
12 stars 1 forks source link

Search across multiple dictionaries or all dictionaries #14

Closed drdhaval2785 closed 2 years ago

drdhaval2785 commented 2 years ago

Description

  1. Many times users want to see dictionary entry for given word in ALL dictionaries on their sustem. Give them an option "ALL" over and above present MW, MWE, AP90 etc.

  2. Other than this, give them option to pass a comma separated list of dictionaries. If I pass MW,AP90 I should get entries from these two dictionaries.

What I Did

Feature request.
drdhaval2785 commented 2 years ago

@hrishikeshrt Sorry for bombarding you with so many feature requests. I would have loved to do PRs, but my understanding of OOP is quite primitive, and I do not want to break your workflow.

Once again, I express my heartfelt gratitude for making this wrapper around CDSL data, which would make programmatic access to CDSL data very easy. And that is the precise reason why I keep shooting so many feature requests. I want this tool to be as mature as it can be.

Once mature and v1.0.0 is launched, I intend to put a link to it on CDSL webpage, so that developers can find it easily.

hrishikeshrt commented 2 years ago

@drdhaval2785 Please don't apologize for feature requests! In fact, thank you for taking such a keen interest and testing actively.

Searching across multiple dictionaries is something I had in mind as well. For an end-user, it is definitely useful. For a programmer, I am trying to finalize a good usable syntax.

hrishikeshrt commented 2 years ago

Commit 97741574e1d75a0a79aca73a75f75d3055541b0a adds a search functionality to CDSLCorpus class.

Help on method search in module pycdsl.pycdsl:

search(pattern, dict_ids=None, input_scheme=None, output_scheme=None, ignore_case=False, limit=None, offset=None, omit_empty=True) method of pycdsl.pycdsl.CDSLCorpus instance
    Search in the dictionary

    Parameters
    ----------
    pattern : str
        Search pattern, may contain wildcards (`*`).
    dict_ids : list or None
        List of dictionary IDs to search in.
        Only the `dict_ids` that exist in `self.dicts` will be used.
        If None, all the dictionaries that have been setup,
        i.e., the dictionaries from `self.dicts` will be used.
        The default is None.
    input_scheme : str or None, optional
        Input transliteration scheme
        If None, `self.input_scheme` will be used.
        The default is None.
    output_scheme : str or None, optional
        Output transliteration scheme
        If None, `self.output_scheme` will be used.
        The default is None.
    ignore_case : bool, optional
        Ignore case while performing lookup.
        The default is False.
    limit : int or None, optional
        Limit the number of search results to `limit`.
        The default is None.
    offset : int or None, optional
        Offset the search results by `offset`.
        The default is None
    omit_empty : bool, optional
        If True, only the non-empty search results will be included.
        The default is False.

    Returns
    -------
    dict
        Dictionary of (dict_id, list of matching entries)

I am wondering which approach to take for REPL. Should use allow multiple dictionaries being used at the same time, such as use MW AP90 VCP and default search happens in these dictionaries?

drdhaval2785 commented 2 years ago

Should use allow multiple dictionaries being used at the same time, such as use MW AP90 VCP and default search happens in these dictionaries?

I agree

hrishikeshrt commented 2 years ago

8299e2b841c015b77f1f172fad2ac06fdf877ccc introduces this behaviour. Currently, it can be tried by performing make install in the repository. I need to add some documentation and some clean-up, post which I'll release 0.4.0

hrishikeshrt commented 2 years ago

0.4.0 released. f288f1794af38cf3396d31b6b5578318e59e45d5