openlibhums / bepress

A tool for importing / migrating content from Bepress Digital Commons publications into Janeway journal system
MIT License
1 stars 1 forks source link

Add support for importing content from CSV #2

Closed mauromsl closed 1 year ago

mauromsl commented 3 years ago

As of v1.1, this plugin can import article metadata from bepress XML metadata files. However, access to those metadata files is restricted to bepress users who have enabled the AWS backup system only, which is not included in their base pricing.

All Bepress users can export their metadata split into two CSV files which, when combined, provide the same metadata as the XML files.

We should support importing through these CSV files as it has two major advantages:

Dev Plan

Since the Metadata structure is roughly the same between CSV and XML files, it should be possible to re-use most of the existing business logic.

  1. Split import interface from business logic: Ensure current importers do not rely on the incoming XML metadata files, but rather take in parsed input data. 1.1 Design a metadata schema that can be used to parse the incoming data regarless of it coming from XML or CSV
  2. Add an interface for uploading the 2 CSV files. 2.1 Files can be correlated by the common key context_key 2.2 Issue structure needs to be parsed from a string, rather than from a directory path 2.3 Author metadata contained within suffixed columns [...]
  3. Add a CSV parser into the metadata schema defined in 1.1
  4. Support fetching files from the URL provided in the metadata (Current importer relies on the files provided in the Bepress backup)
  5. Write new tests for: 5.1 importing content from into Janeway from a sample dataset formatted by the the schema produced in 1.1 5.2 Parsing CSV into the new metadata schema 5.3 Parsing XML into the new metadata schema
  6. Write installation and usage instructions into README.md
mauromsl commented 1 year ago

Closed by b580f236cdf279ad64ec4b181c51fa35c4032179