pelican-plugins / search

Pelican plugin that adds site search capability
49 stars 9 forks source link

Add settings for Stork input and output options #5

Closed lioman closed 1 year ago

lioman commented 2 years ago

Stork index creation can be configured via additional options. To allow all of them add SEARCH_ADDITIONAL_OPTIONS settings. Content will be added to stork.toml in input block.

Pull Request Checklist

Resolves: #4

lioman commented 2 years ago

Do we need anything to get this merged? I don't want to pin this dependency to git but use a normal version instead

justinmayer commented 2 years ago

Hey Lioman. Anything that adds/changes settings has to be considered a bit more carefully than other contributions, since changing them later can cause problems for people using previous setting names. I am completely swamped at the moment, so all I can promise is that I will review this in depth as soon as I possibly can.

lioman commented 2 years ago

Sure! No problem, thanks for answering. I added as additional parameter. So if you do not add it, nothing changes.

lioman commented 2 years ago

Sorry to ping you again. @justinmayer Is it possible, that you review this? I hate long pending tasks

Kristinita commented 2 years ago

Type: Request :mailbox_with_mail:

1. Request

@lioman , could you make that Pelican users add options in the traditional Pelican format, which is used in other plugins?

Example:

SEARCH_INPUT_OPTIONS = {
    "displayed_results_count": 100
}

2. Argumentation

I store my Pelican settings in .yaml, not .py files. YAML is a specifically configuration file format, and I don’t know how to parse the configuration in Python files directly.

If the settings after your change could be entered in the traditional format, YAML users could add them like this:

SEARCH_INPUT_OPTIONS:
  displayed_results_count: 100

I don’t know, how to convert your chosen option format to YAML.

Thanks.

lioman commented 2 years ago

@Kristinita I can, but this would be way more complicated, as the settings need to be parsed and translated to a toml format. I chose the current because it is the simplest approach. Basically, it is just a string the gets directly injected and for your yaml setting this should be possible too. Have not tested it, but this should be valid

SEARCH_INPUT_OPTIONS: |
    stemming = "German"
    url_prefix = "https://example.com"
Kristinita commented 2 years ago

Type: Another request :mailbox_with_mail:

Thanks, it works for me!

But when I try to change the value of the option displayed_results_count, I get an error:

Exception: Search plugin reported Error: Couldn't read the configuration file: Cannot parse config as TOML. Stork recieved error: unknown field `displayed_results_count`, expected one of `surrounding_word_count`, `base_directory`, `url_prefix`, `title_boost`, `stemming`,
`html_selector`, `exclude_html_selector`, `frontmatter_handling`, `files`, `break_on_file_error`, `srt_config`, `minimum_indexed_substring_length`, `minimum_index_ideographic_substring_length` for key `input` at line 772 column 1`

It seems that due to your solution, it’s solely possible to change the values for the so-called Stork “Input Options”. But Stork configuration isn’t solely “Input Options”.

@lioman , what are your thoughts on making it possible to change the values of another Stork options as well for Pelican users?

Thanks.

lioman commented 2 years ago

I discovered the output options later and then I want to wait till this PR is merged and implement the output options in the same way.

But apparently @justinmayer hasn't found the time yet to review it. I can change it now so that in- and output options are configurable but I want to know if we can do it this way.

Kristinita commented 2 years ago

Type: Another request :mailbox_with_mail:

@lioman, You also use stemming = "German" in your example, but according to Stork stemming documentation I think the stemming value should be generated completely automatically for the Pelican based on user Pelican settings. Example of generated search.toml:

[input]
base_directory = "D:/Kristinita/output"
html_selector = "section"
# [INFO] If “DEFAULT_LANG = 'ru'” in Pelican settings
# https://docs.getpelican.com/en/latest/settings.html#translations
stemming = "Russian"

# [INFO] The article on default language for the project,
# no “stemming_override”
[[input.files]]
path = "Статья-на-русском-языке.html"
url = "/Статья-на-русском-языке.html"
title = "Статья на русском языке"

# [INFO] “Lang: en” metadata in Pelican article or page:
# https://docs.getpelican.com/en/latest/content.html#file-metadata
[[input.files]]
path = "Article-in-English.html"
url = "/Article-in-English.html"
title = "Article in English"
stemming_override = "English"

Thanks.

justinmayer commented 2 years ago

Really sorry for the delay, Lioman. I'll be on vacation until the end of September but will do my utmost to review it upon my return. Many thanks for all your patience! 😅

lioman commented 2 years ago

Really sorry for the delay, Lioman. I'll be on vacation until the end of September but will do my utmost to review it upon my return. Many thanks for all your patience! 😅

No problem - have a nice vacation

lioman commented 2 years ago

Type: Another request 📬

@lioman, You also use stemming = "German" in your example, but according to Stork stemming documentation I think the stemming value should be generated completely automatically for the Pelican based on user Pelican settings.…

I don't get what you say here. but perhaps this is for another issue to automatically detect the language based on file language. This would be another issue/pr and a different discussion. This should not change anything here as this one is just for setting anything, that is not already covered.

I'm even willed to implement it. As this would be nicer, to have this automatically covered and no need to think about it

paulocoutinhox commented 2 years ago

It will be merged?

lioman commented 2 years ago

@justinmayer please review it - it is open so long now

nandac commented 1 year ago

@lioman @justinmayer I would love to see this feature merged and released. Is there any way I may help with this?

lioman commented 1 year ago

@lioman @justinmayer I would love to see this feature merged and released. Is there any way I may help with this?

Me too, waiting a long time for it

justinmayer commented 1 year ago

Hey folks. I'm aware, and I'm doing the best I can to allocate time to review this PR this month. Thanks for your patience.

lioman commented 1 year ago

Thanks @justinmayer ! If I can do anything to help, just call out.

lioman commented 1 year ago

@justinmayer I just resolved the conflicts - hope you find some time to review this

lioman commented 1 year ago

@justinmayer sorry to ping you again. I just want to complete this issue. Do I need to change something or is it fine like it is?

justinmayer commented 1 year ago

First, thanks to @lioman for taking the time to implement and submit this pull request.

My main concern with the implementation in its current form is that I originally added two of the more important "input" options as separate settings. It seems awkward to add the rest of the Stork input options in a separate monolithic setting, with the original two dangling off by themselves. I imagine it would be better to either add all the settings as separate (which seems hard to maintain) or to move the original two separate input settings into a new monolithic setting. The latter seems like the far more sensible approach to me, of course with a deprecation period that maintains backwards compatibility for a suitable length of time.

Perhaps we could start off with new STORK_INPUT_OPTIONS, STORK_OUTPUT_OPTIONS, etc., since these settings are no longer tied to the Search plugin so much as they are to Stork itself. I think they would probably be best as key/value Python dict objects that are then unpacked into the TOML form that Stork expects.

Thoughts?

lioman commented 1 year ago

I deliberately chose to handle those input options differently, as the other two should come from the rest of config. But that was due to my simple approach of using just a plain string. With a python object I think we have more power here and I will implement it with a sensible default, so that nothing changes for the ones without changes. I suggest, to introduce tomli-w as a dependency. (Just learned that tomllib/tomli can only read toml) Is that fine @justinmayer ?

justinmayer commented 1 year ago

Sure, adding a dependency to more easily write TOML should be fine. As far as available options, in addition to https://github.com/hukkin/tomli-w, there is also TOMLkit. I've never used either, so feel free to use whichever seems like the best fit.

lioman commented 1 year ago

@justinmayer I have rewritten the hole stork settings generation with support for input and output options. Additionally, I added some tests, so workflow needs an approval first. Please be so kind and review it soon.

lioman commented 1 year ago

Hi @justinmayer I changed the documentation according to your remarks. If we good to go for a merge I suggest to squash the commits together as there are many - or we decline this one and I open one with this branch: https://github.com/lioman/search/tree/4-add-control-over-input-output-options

justinmayer commented 1 year ago

@lioman: I have taken care of adding the RELEASE.md file and am about to merge this PR, which should trigger a new release.

As a suggestion for the future, I highly recommend doing all work on a feature/topic branch instead of the default main branch, even in the context of your own fork. Doing so really does help make things easier for maintainers. In this case, the release automation logic got confused by the main branch designation (even though it wasn't this repository's main branch) and tried to issue a release upon a push event, even though the PR hadn't been merged yet. Using a feature/topic branch would have prevented that problem.

Thank you for all your work on — and patience with — this pull request. This is a really nice enhancement to the plugin. Much appreciated! ✨