Closed IMayBeABitShy closed 3 months ago
I will do a full review tomorrow, but it looks like a very good job.
Until the whole review, I can already tell that you will have to use the compute_descriptions
utility function:
default_description
is the site tag lineuser_description
and user_long_description
are the values provided at the CLIThis function takes care of doing some "intelligent" logic based on these three values and we would like to share this logic across all scrapers.
The favicon/Illustration_48x48@1 needed a rework, as it now needs to be specified when calling .config_metadata(). I had to move the conversion for the 48x48 image to the creator setup, but this led to some duplicate code and is a IMO rather ugly solution. Some feedback on how to refactor the code to avoid this would be appreciated.
Agreed. @rgaudin is there any reason to not include the Illustration_96x96_at_1 as an optional in config_metadata
of scraperlib ?
Otherwise LGTM
Agreed. @rgaudin is there any reason to not include the Illustration_96x96_at_1 as an optional in
config_metadata
of scraperlib ?
config_metadata
is for the common metadata that we set in scrapers. Illustration_96x96@1
is not. It can be set later as for any other metadata.
Once we have a strategy or support in readers for other illustrations, we might reconsider but at the moment, that 96x96 illustration is not usable. I was enthusiastic we'd add support in readers along with the libzim feature but it's not considered a priority. I suggest we simply remove it from sotoki 🙁
Should I remove the code for Illustration_96x96@1
? Also, perhaps it would be a good idea to implement shared logic for illustration handling in python-zimscraperlib
? For example, we could write a method to only pass a file path to the zimscraperlib and it could decide which illustrations to include, handling resize and format conversions as necessary.
Should I remove the code for
Illustration_96x96@1
?
Yes
Also, perhaps it would be a good idea to implement shared logic for illustration handling in
python-zimscraperlib
? For example, we could write a method to only pass a file path to the zimscraperlib and it could decide which illustrations to include, handling resize and format conversions as necessary.
There's already something that handles part of this I believe using fpath or URL. Please check it out and open a ticket to improve/replace/add something.
@IMayBeABitShy will you be able to finish this soon or should I take this over? I do not mind, I know this is only a side-project for you, it is just that we plan to make a release soon and if we could have this as well in the release, it would be great. We would like to do the release this week ideally, but we have already waited many months to do a release, so we can wait a bit more if needed.
@IMayBeABitShy will you be able to finish this soon or should I take this over? I do not mind, I know this is only a side-project for you, it is just that we plan to make a release soon and if we could have this as well in the release, it would be great. We would like to do the release this week ideally, but we have already waited many months to do a release, so we can wait a bit more if needed.
Sorry about that, I've forgotten that I hadn't already pushed the last commit and encountered a problem with a library, which prevented me from verifying the changes. I've pushed the last commit, which removes the extra illustration, now.
I've implemented the requested changes.
This is probably wrong, you need 3.3.0 to have https://github.com/openzim/python-scraperlib/pull/126
Correct. Seems like I've missed this during the last update, there wasn't a 3.3.0 version yet when I've opened this PR. Fixed.
(PR depends on https://github.com/openzim/python-scraperlib/pull/126 and will likely need at least one revision due to some feedback being needed)
This PR upgrades
python-scraperlib
to 3.x while also adding a--long-description argument
. This implements #290.Changes:
--long-description
argument (--description
already exists and was not needed to be added). It defaults to the tagline.Description
metadata is now limited in length to conform with python-scraberlibs metadata validation, which enforces a maximum lengthFlavour
andScraper
metadata will now be setTest status:
sotoki --debug --domain="boardgames.stackexchange.com" --mirror="https://org-kiwix-stackexchange.s3.us-west-1.wasabisys.com" --output="./output" --threads="28" --stats-filename="./task_progress.json" --publisher="openZIM"
as well as--description="test description" --long-description="test long description"
.zimcheck
complains about article contents, redundancy and missing links, but I don't think the missing profile pictures are related to the changes.Comments:
Illustration_48x48@1
needed a rework, as it now needs to be specified when calling.config_metadata()
. I had to move the conversion for the 48x48 image to the creator setup, but this led to some duplicate code and is a IMO rather ugly solution. Some feedback on how to refactor the code to avoid this would be appreciated.Edit:
This PR fixes #290