openzim / cms

ZIM file Publishing Platform
https://cms.openzim.org
GNU General Public License v3.0
4 stars 0 forks source link

XML Library Digester #26

Closed rgaudin closed 2 years ago

rgaudin commented 2 years ago

A digester producing a library-formatted XML stream from the Titles of the database.

At this stage, the digester, will not filter out any titles.

rgaudin commented 2 years ago

For this, we'll have a digesters.library_xml module that produces a valid Kiwix XML Library similar to the central one.

At this stage, we will simply loop over all Books (having Title) and add nodes using a templating system (Jinja2). Templating will be faster than building an XML tree for which we have no use.

Sample XML:

<book
  id="0029d997-2d65-ff83-85de-03ce4d5a79c8"
  path="../var/www/download.kiwix.org/zim/wikipedia/wikipedia_ar_football_nopic_2021-10.zim"
  title="كرة القدم ويكيبيديا"
  description="مجموعة مختارة من مقالات ويكيبيديا على كرة القدم"
  language="ara"
  creator="Wikipedia"
  publisher="Kiwix"
  name="wikipedia_ar_football"
  flavour="nopic"
  tags="wikipedia;_category:wikipedia;_pictures:no;_videos:no;_details:yes;_ftindex:yes"
  faviconMimeType="image/png"
  favicon="iVBORw0KGgRU5ErkJggg=="
  date="2021-10-29"
  url="http://download.kiwix.org/zim/wikipedia/wikipedia_ar_football_nopic_2021-10.zim.meta4"
  articleCount="106250"
  mediaCount="11"
  size="444349"/>

Data would come from:

XML property Data from
id Book.id
path constructed reusing filename from Book.url
title Title's Title metadata
description Title's Description metadata
language Title.language
creator Title's Creator metadata
publisher Title's Publisher metadata
name Book.name
flavour Books's Flavour metadata
tags Title's Tags ∪ Book's private Tags
faviconMimeType Title's Illustration_48x48 metadata type. We can hard-code that to image/png for now.
favicon Title's Illustration_48x48 metadata
date Book's Date metadata
url constructed from Book.url
articleCount Book.article_count
mediaCount Book.media_count
size Book.size divided by 1024 (XML Library uses KiB`
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.