microsimulation / ijm

A central place for general issues, documents, scripts and resources for the IJM
https://microsimulation.org/ijm/
MIT License
4 stars 1 forks source link

Displaying articles on the website before a whole issue is finalized #157

Open pbronka opened 1 year ago

pbronka commented 1 year ago

Hi @gnott,

We would like to process and display articles on the website as soon as they are accepted for publication, without the need to wait until the planned publication date of a next issue. That is, as soon as we accept an article, we would process it and upload PDF an XML files to the S3 bucket, and we would like to display the article on the website in the "recent research" section, but not assign it to any specific issue. Then, when we have enough articles for a complete issue and we reach the planned publication date, we would like to bundle them together to create a new issue (PDF and XML would be re-generated at this point) which would appear on the website in the usual way.

Can this be accommodated within the current conversion process? If not, what changes should we make?

Hope this is clear, but please let me know if I could provide any additional information.

Thanks!

gnott commented 1 year ago

I think I understand in general, and I like the idea of publishing material early.

Would you assign a permanent article id to the article when it is first published? These are used in the URL and file names.

The website requires an article to be part of an issue otherwise it causes an error, at least it is what I remember, and I think it was related to how the search system works.

Generating JSON from the XML which I help with does not require the article to be part of an issue in order for it to be valid. There will always be required data such as a publication date and a DOI, for example.

I suggest you could take an existing article and use it to try and create a Recent research section, unless you've already begun on it. Will you be doing the changes to the journal code yourself internally?

I can provide a summary of the steps I follow to convert articles if it would help to understand how it might be possible to make it a more automated process and something you can have more control over.

pbronka commented 1 year ago

Thank you very much for this advice @gnott.

Would you assign a permanent article id to the article when it is first published? These are used in the URL and file names.

Yes, we decide on the permanent article id when we first send it for processing and I think we would be able to continue doing that.

The website requires an article to be part of an issue otherwise it causes an error, at least it is what I remember, and I think it was related to how the search system works.

I suggest you could take an existing article and use it to try and create a Recent research section, unless you've already begun on it. Will you be doing the changes to the journal code yourself internally?

Thanks. I will need to look into this part. I might try to implement required changes locally and see how I get on with it.

I can provide a summary of the steps I follow to convert articles if it would help to understand how it might be possible to make it a more automated process and something you can have more control over.

That could be very helpful for us to better understand the whole process, thanks!

gnott commented 1 year ago

Here's the basic steps I follow, of course the trouble appears when data is missing or is invalid, or if file names are inconsistent.

  1. Extract the zip: Run a shell script to extract the .zip file, where the XML is copied over to a folder for more processing, the PDF and figure files are copied into a new folder in the https://github.com/microsimulation/ijm/tree/master/assets/files folder
  2. Run the journal site locally: docker-compose up, this is so the IIIF server is running to provide image dimensions of figures
  3. Convert XML to JSON: Using the ijm branch of bot-lax-adaptor (https://github.com/elifesciences/bot-lax-adaptor/tree/ijm), with a specific app.cfg configuration file, run a command to convert XML to JSON, e.g. python src/main.py ijm/ijm-00265.xml > ijm/ijm-00265.xml.json An error may occur if there is a probably with the conversion
  4. Validate the JSON: It's possible to detect potential errors in the JSON at this stage, e.g. python src/validate.py ijm/ijm-00265.xml.json The output of the command may be a validation error, the cause of it to be investigated
  5. Convert JSON to final JSON: The JSON can be truncated and cleaned up to make it suitable for the IJM site by running https://github.com/elifesciences/bot-lax-adaptor/blob/ijm/src/final_ijm.py, e.g. python src/final_ijm.py ijm/ijm-00265.xml.json > ~/[path_to_repo]/ijm/api/data/articles/00265.json
  6. Add to a collection: Create or modify the collection data file in the IJM project to include the new article, e.g. https://github.com/microsimulation/ijm/blob/master/api/data/collections/15-2.json
  7. Test journal site locally: Now the article JSON and collection JSON are configured, stop the journal site and start it again (docker-compose up) which will render the new JSON data