+title: SFB 1451 publication catalog

Repository overview The catalog is placed in the =docs= directory (to allow other repository content at root level), and served to GitHub pages from there.
Related repositories:
SFB superdataset: [[https://github.com/sfb1451/all-projects][sfb1451/all-projects]]
Utilities for handling tabby files: [[https://github.com/sfb1451/tabby-utils][sfb1451/tabby-utils]]
Custom extractors & translators: [[https://github.com/mslw/datalad-wackyextra][mslw/datalad-wackyextra]]
Required extensions The following DataLad extensions are required to generate the catalog: [[https://github.com/datalad/datalad-next][DataLad-next]] is used to interact with WebDAV remotes, [[https://github.com/datalad/datalad-metalad][DataLad-metalad]] provides metadata extraction, [[https://github.com/mslw/datalad-wackyextra][DataLad-wackyextra]] provides custom metadata extractors and translators, [[https://github.com/datalad/datalad-catalog][DataLad-catalog]] is used to generate the catalog.
Dataset locations Project datasets are tracked in the =Projects= superdataset. GitHub mirror: https://github.com/sfb1451/all-projects

** Obtaining a subdataset from Sciebo:

+begin_src bash

datalad clone -d . webdavs:///Projects//

+end_src

(using project names as folder names, because these would be displayed as subdataset names).

Tip: until [[https://github.com/datalad/datalad-next/issues/108][datalad-next/issues/108]] makes it automatic, storage remote can be reconfigured with:

+begin_src bash

git annex initremote my-sciebo-storage --private --sameas exporttree=yes type=webdav url=""

+end_src

or with clone url substitution - check:

+begin_src bash

datalad configuration | grep 'datalad.clone.url-substitute'

+end_src

to get some examples how you can alter all such URLs at once.

Generation

Generation is done incrementally. Utility scripts are provided in the code directory. One script usually does one thing.

Scripts have similar CLI: they need to be pointed to a dataset, a folder to store intermediate metadata, and (optionally) to a catalog they will update. See command line help or source code for usage instructions.

Key scripts are:

=extract_selected.py=: extract metadata from a project's subdataset. It allows selection of extractors to be used. Can optionally generate a file list, too.
=extract_project.py=: extract metadata from a project's superdataset. Conducts =project_extract_pipeline.json= and translates the result. This pipeline combines metalad_core, CFF, and bibliography (ris/nbib/crossref) extractors. It also adds handcrafted metadata, like funding or keywords.
=extract_superdataset=: extract superdataset metadata. Runs =metalad_core= and =metalad_studyminimeta= extractors.

Additional scripts are:

=utils.py=: shared functions or classes
=list_files.py=: generate a file listing based on =datalad status=. Deprecated, use =extract_selected.py --files=.
=scrape_projects.py=: archived script that was used to scrape project descriptions.
=scrape_publications.py=: archived script that was used to scrape project publications.

sfb1451 / metadata-catalog

readme

+title: SFB 1451 publication catalog

+begin_src bash

+end_src

+begin_src bash

+end_src

+begin_src bash

+end_src