hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

Automatically add SkoHub blog posts to team site #485

Open acka47 opened 1 year ago

acka47 commented 1 year ago

While working on #484, I've noticed that the last three or four posts from the SkoHub blog are missing at http://lobid.org/product/skohub.(I have added the missing presentations with https://github.com/hbz/lobid/pull/484/commits/36d54d29afb35f273876f32dddccacf0d9f91c21, though.) As we will be publishing more frequently in the coming months, we should think about automating the addition of these posts.

This could be implemented both by @sroertgen or @fsteeg , I guess.

sroertgen commented 1 year ago

So I had a first look and what we could do is maybe fetch the xml-Feed of each blog and build the publications from there. This has to happen on the client side then I guess. Is this the kind of automated addition you have in mind?

fsteeg commented 1 year ago

Hm, so I think our goal should be to add files in gatsby/lobid/static/publication to have a uniform data base. That could happen from within the repo here, as you describe, by fetching the feeds and creating the files for them here (if that's what you mean).

However I'd think the cleanest approach would be to keep the creation of these files out of the scope for this repo, and instead create them elsewhere. Maybe triggered by a GitHub action when we push to the blogs, which then calls some conversion and then pushes the files here? Not sure if that makes sense, just some thoughts.

acka47 commented 1 year ago

I first liked the RSS approach as it may be independent from the actual blog software (we will have to integrate two Gqatsby and one Jekyll blog). However, after taking a short look at the RSS XML of the SkoHub blog, I am afraid that the RSS doesn't convey important structured data from the YAML frontmatter like author and tags. or am I missing something. If the RSS could be tweaked to include this, the approach might work after all, otherwise we will have to fetch th structured data from elsewhere. Also the HTMl of the blog post does not include structured data. I guess this might be configured with gatsby (a schema.org plugin maybe, see https://snappywebdesign.net/blog/how-to-add-structured-data-to-blog-posts-in-gatsby/). Otherwise we could think about @fsteeg 's approach to fetch it/push it directly from the git repo.

sroertgen commented 1 year ago

I am afraid that the RSS doesn't convey important structured data from the YAML frontmatter like author and tags

I think this can be configured, e.g. the lobid-blog contains also author information: https://blog.lobid.org/feed.xml There are no author ids given, I would have to look how far this can be configured.

[...] otherwise we will have to fetch th structured data from elsewhere. Also the HTMl of the blog post does not include structured data.

I think this is a good hint. We should add structured data to the blog posts and then we can use the RSS feeds to get the links and from there we get the structured data.

If you agree, @acka47, I will open issues in our three blog systems (lobid, metafacture, skohub) and add the structured metadata there. Then I will continue on this issue and pull the structured data from there.

@fsteeg I get your point as well, because this will lead to an inconsitent publication database since one does not find every publication there since the blog posts get fetched dynamically. However the approach we want to take depends on how important it is that this database contains all data. If it is kind of authorative we should switch to an approach where these files are created. If it is okay to have all data on the website (we could also think about adding structured data there about all the publications after they got fetched).

I'm open for both thoug I think the RSS approach is easier to implement.

acka47 commented 1 year ago

the lobid-blog contains also author information: https://blog.lobid.org/feed.xml There are no author ids given, I would have to look how far this can be configured.

There are no IDs in the YAML frontmatter of the lobid blog so this is fine. See e.g. https://github.com/hbz/lobid-blog/blob/master/_posts/2022-08-19-job-projektkoordinatorin.md

If you agree, @acka47, I will open issues in our three blog systems (lobid, metafacture, skohub) and add the structured metadata there. Then I will continue on this issue and pull the structured data from there.

+1 I think in the lobid blog feed only the tags are missing so not much to be done there.

I'm open for both thoug I think the RSS approach is easier to implement.

@fsteeg let us know if you still have problems with this approach. Then we should schedule a 30 min meeting to discuss this.

fsteeg commented 1 year ago

I like the idea of using the RSS, my point was more about what we do with it (create JSON files) and where (not in this repo). I don't think it would be a nice solution to create the publication list on https://lobid.org/team both from files and from RSS feeds, if that's the suggestion, since that whole system is based on the files, the queries against the files etc. But maybe it's worth to reconsider that whole 'knowledge graph' approach to the website.

fsteeg commented 1 year ago

Maybe it makes sense to approach this from a different angle: we could set up a new page to list the team publications, which uses the https://lobid.org/team/feed.xml RSS feed, and other feeds like the SkoHub blog, to create a complete list of publications (which we should publish as RSS again).

That way, we basically have two separate things: 1) a list of publications aggregated from different RSS sources and 2) a system to publish JSON files as RSS (our current setup). All sources that already publish RSS could come in via 1), and for all sources that we have no RSS for, we create JSON files in 2).