Closed lioman closed 1 year ago
Do we need anything to get this merged? I don't want to pin this dependency to git but use a normal version instead
Hey Lioman. Anything that adds/changes settings has to be considered a bit more carefully than other contributions, since changing them later can cause problems for people using previous setting names. I am completely swamped at the moment, so all I can promise is that I will review this in depth as soon as I possibly can.
Sure! No problem, thanks for answering. I added as additional parameter. So if you do not add it, nothing changes.
Sorry to ping you again. @justinmayer Is it possible, that you review this? I hate long pending tasks
Type: Request :mailbox_with_mail:
@lioman , could you make that Pelican users add options in the traditional Pelican format, which is used in other plugins?
Example:
SEARCH_INPUT_OPTIONS = {
"displayed_results_count": 100
}
I store my Pelican settings in .yaml
, not .py
files. YAML is a specifically configuration file format, and I don’t know how to parse the configuration in Python files directly.
If the settings after your change could be entered in the traditional format, YAML users could add them like this:
SEARCH_INPUT_OPTIONS:
displayed_results_count: 100
I don’t know, how to convert your chosen option format to YAML.
Thanks.
@Kristinita I can, but this would be way more complicated, as the settings need to be parsed and translated to a toml format. I chose the current because it is the simplest approach. Basically, it is just a string the gets directly injected and for your yaml setting this should be possible too. Have not tested it, but this should be valid
SEARCH_INPUT_OPTIONS: |
stemming = "German"
url_prefix = "https://example.com"
Type: Another request :mailbox_with_mail:
Thanks, it works for me!
But when I try to change the value of the option displayed_results_count
, I get an error:
Exception: Search plugin reported Error: Couldn't read the configuration file: Cannot parse config as TOML. Stork recieved error: unknown field `displayed_results_count`, expected one of `surrounding_word_count`, `base_directory`, `url_prefix`, `title_boost`, `stemming`,
`html_selector`, `exclude_html_selector`, `frontmatter_handling`, `files`, `break_on_file_error`, `srt_config`, `minimum_indexed_substring_length`, `minimum_index_ideographic_substring_length` for key `input` at line 772 column 1`
It seems that due to your solution, it’s solely possible to change the values for the so-called Stork “Input Options”. But Stork configuration isn’t solely “Input Options”.
@lioman , what are your thoughts on making it possible to change the values of another Stork options as well for Pelican users?
Thanks.
I discovered the output options later and then I want to wait till this PR is merged and implement the output options in the same way.
But apparently @justinmayer hasn't found the time yet to review it. I can change it now so that in- and output options are configurable but I want to know if we can do it this way.
Type: Another request :mailbox_with_mail:
@lioman, You also use stemming = "German"
in your example, but according to Stork stemming
documentation I think the stemming
value should be generated completely automatically for the Pelican based on user Pelican settings. Example of generated search.toml
:
[input]
base_directory = "D:/Kristinita/output"
html_selector = "section"
# [INFO] If “DEFAULT_LANG = 'ru'” in Pelican settings
# https://docs.getpelican.com/en/latest/settings.html#translations
stemming = "Russian"
# [INFO] The article on default language for the project,
# no “stemming_override”
[[input.files]]
path = "Статья-на-русском-языке.html"
url = "/Статья-на-русском-языке.html"
title = "Статья на русском языке"
# [INFO] “Lang: en” metadata in Pelican article or page:
# https://docs.getpelican.com/en/latest/content.html#file-metadata
[[input.files]]
path = "Article-in-English.html"
url = "/Article-in-English.html"
title = "Article in English"
stemming_override = "English"
Thanks.
Really sorry for the delay, Lioman. I'll be on vacation until the end of September but will do my utmost to review it upon my return. Many thanks for all your patience! 😅
Really sorry for the delay, Lioman. I'll be on vacation until the end of September but will do my utmost to review it upon my return. Many thanks for all your patience! 😅
No problem - have a nice vacation
Type: Another request 📬
@lioman, You also use
stemming = "German"
in your example, but according to Storkstemming
documentation I think thestemming
value should be generated completely automatically for the Pelican based on user Pelican settings.…
I don't get what you say here. but perhaps this is for another issue to automatically detect the language based on file language. This would be another issue/pr and a different discussion. This should not change anything here as this one is just for setting anything, that is not already covered.
I'm even willed to implement it. As this would be nicer, to have this automatically covered and no need to think about it
It will be merged?
@justinmayer please review it - it is open so long now
@lioman @justinmayer I would love to see this feature merged and released. Is there any way I may help with this?
@lioman @justinmayer I would love to see this feature merged and released. Is there any way I may help with this?
Me too, waiting a long time for it
Hey folks. I'm aware, and I'm doing the best I can to allocate time to review this PR this month. Thanks for your patience.
Thanks @justinmayer ! If I can do anything to help, just call out.
@justinmayer I just resolved the conflicts - hope you find some time to review this
@justinmayer sorry to ping you again. I just want to complete this issue. Do I need to change something or is it fine like it is?
First, thanks to @lioman for taking the time to implement and submit this pull request.
My main concern with the implementation in its current form is that I originally added two of the more important "input" options as separate settings. It seems awkward to add the rest of the Stork input options in a separate monolithic setting, with the original two dangling off by themselves. I imagine it would be better to either add all the settings as separate (which seems hard to maintain) or to move the original two separate input settings into a new monolithic setting. The latter seems like the far more sensible approach to me, of course with a deprecation period that maintains backwards compatibility for a suitable length of time.
Perhaps we could start off with new STORK_INPUT_OPTIONS
, STORK_OUTPUT_OPTIONS
, etc., since these settings are no longer tied to the Search plugin so much as they are to Stork itself. I think they would probably be best as key/value Python dict
objects that are then unpacked into the TOML form that Stork expects.
Thoughts?
I deliberately chose to handle those input options differently, as the other two should come from the rest of config. But that was due to my simple approach of using just a plain string. With a python object I think we have more power here and I will implement it with a sensible default, so that nothing changes for the ones without changes.
I suggest, to introduce tomli-w
as a dependency. (Just learned that tomllib
/tomli
can only read toml)
Is that fine @justinmayer ?
Sure, adding a dependency to more easily write TOML should be fine. As far as available options, in addition to https://github.com/hukkin/tomli-w, there is also TOMLkit. I've never used either, so feel free to use whichever seems like the best fit.
@justinmayer I have rewritten the hole stork settings generation with support for input and output options. Additionally, I added some tests, so workflow needs an approval first. Please be so kind and review it soon.
Hi @justinmayer I changed the documentation according to your remarks. If we good to go for a merge I suggest to squash the commits together as there are many - or we decline this one and I open one with this branch: https://github.com/lioman/search/tree/4-add-control-over-input-output-options
@lioman: I have taken care of adding the RELEASE.md
file and am about to merge this PR, which should trigger a new release.
As a suggestion for the future, I highly recommend doing all work on a feature/topic branch instead of the default main
branch, even in the context of your own fork. Doing so really does help make things easier for maintainers. In this case, the release automation logic got confused by the main
branch designation (even though it wasn't this repository's main
branch) and tried to issue a release upon a push event, even though the PR hadn't been merged yet. Using a feature/topic branch would have prevented that problem.
Thank you for all your work on — and patience with — this pull request. This is a really nice enhancement to the plugin. Much appreciated! ✨
Stork index creation can be configured via additional options. To allow all of them add SEARCH_ADDITIONAL_OPTIONS settings. Content will be added to stork.toml in input block.
Pull Request Checklist
Resolves: #4