pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
304 stars 444 forks source link

Create toolset to convert OJS 2.x native import/export XML to OJS 3.x #5067

Open asmecher opened 5 years ago

asmecher commented 5 years ago

Both OJS 2.x and OJS 3.x support XML import/export of published content, but two formats are incompatible.

Create a tool (e.g. XSL) to transform OJS 2.x exported content to a form that OJS 3.x can import. (Note that neither OJS 2.x nor 3.x XML formats are immutable -- both evolve slightly over time.)

OJS 2.4.8 XML DTD:

OJS 3.1.2 XML schema:

asmecher commented 5 years ago

A sample OJS 2.4.8 export (only containing 2 articles, unfortunately): ojs-2.4.8.xml.txt A sample OJS 3.1.2 export: ojs-3.1.2.xml.txt (The two unfortunately aren't identical but both are typical)

quoideneuf commented 4 years ago

@asmecher - Here is a work in progress, hope to have the article crosswalk finished soon...

https://github.com/quoideneuf/ojs-2xto3x-xwalk

Marco60428 commented 4 years ago

Hi, I try to use your php work in order to convert xml from ojs 2.4.8 to 3. but I have some problem. The first is the required /vendor/autoload.php : have you some upgrade?

can you help me, please! regards Marc

quoideneuf commented 4 years ago

Hi, @Marco60428 - The tool is intended to be used with the Composer dependency management system. I updated the README, and also modified the main script so that you can run a transform without Composer. Using the test suite still requires Composer.

Marco60428 commented 4 years ago

Hi @quoideneuf, thank you very much! I will try soon to convert. Meanwhile, I send to you my very simple xml OJS 2.4.8: I hope you agree, if it is possible, can you try to convert him? to me this is very important to avoid doubts with possible future troubles.

Thank you in advance

issue-269.xml.txt

Marco60428 commented 4 years ago

Hi @quoideneuf, I have some results. I tried to convert the issue-269.xml file send yesterday, I got this message:

mint@mint:~/Desktop/ojs-2xto3x-xwalk$ php transform.php --xml issue-269.xml --out test.xml --xsl transform.xsl PHP Fatal error: Uncaught Error: Class 'DOMDocument' not found in /home/mint/Desktop/ojs-2xto3x-xwalk/transform.php:19 Stack trace:

0 {main}

thrown in /home/mint/Desktop/ojs-2xto3x-xwalk/transform.php on line 19

Can you help, please?

quoideneuf commented 4 years ago

Hi @Marco60428 It looks like maybe you don't have the DOM extension installed, or it is turned off. You should be able to see a 'dom' section in the output of phpinfo() if it is installed correctly:

$ php -r 'phpinfo();' | grep DOM
DOM/XML => enabled
DOM/XML API Version => 20031129

I ran the conversion on your file and the results are attached.

issue-269-converted.xml.txt

Marco60428 commented 4 years ago

Hi @quoideneuf, thank you: now the conversion is ok, without errors. Unfortunately ojs 3.1.2.1 rejects the converted files. can I do something?

quoideneuf commented 4 years ago

Hi @Marco60428 - I had to make some changes to the stylesheet to accommodate "issue" exports. Here is an updated conversion. It will import, but most of the data seems to be missing once it does. I can investigate why that is next week. issue-269-converted.xml.txt

Marco60428 commented 4 years ago

Hi @quoideneuf: thanks for your help: let me know!

Marco60428 commented 4 years ago

Hi, any news? thank you

quoideneuf commented 4 years ago

I have made some further updates here - have you tried transforming your data with the latest version?

https://github.com/quoideneuf/ojs-2xto3x-xwalk

Marco60428 commented 4 years ago

Hi, Sorry but I'm sick at home, I think I got the flu. I hope to try Monday

Marco60428 commented 4 years ago

Hi quoideneuf, now it seems better but I get this message:

_Element '{http://pkp.sfu.ca}article': The attribute 'sectionref' is required but missing.

I realized that, in fact, the ojs3 xls import does not ask a journal section while ojs2 does: so the xml file generated by ojs2 does not contain information about sections ... is it a problem? I hope you can do something. Thanks for your work

quoideneuf commented 4 years ago

Hi @Marco60428 - yes, you are right that the ojs2 exports are missing section data, which is required for ojs 3 import. You can provide the section_ref as a default when you run the transform. See this file and the readme:

https://github.com/quoideneuf/ojs-2xto3x-xwalk/blob/master/article-defaults-example.txt

Marco60428 commented 4 years ago

Hi @quoideneuf: I have done! now the import from ojs3 does not produce immediate errors and it seems, finally, to like the xml file! but the relative import-report indicates an error (see the attached file): I don't know what are the errors because I can't find a log file. in the article-defaults.txt file I wrote (ART is an existing section of a testing 2018 issue): section_ref = ART
seq = 1 access_status = 0 volume = 1 number = 2 year = 2018

I need your help, again

Appunti01

Marco60428 commented 4 years ago

Hi @quoideneuf: finally I have the error codes of ojs3 xml import for article and for issue xml files:

**DB Error: Incorrect double value: '' for column annali.published_submissions.seq at row 1

DB Error: Incorrect integer value: '' for column annali.published_submissions.access_status at row 1**

Any suggestion, please?

Marco60428 commented 4 years ago

Hi quoideneuf, excuse me if I'm back on this topic but I have no ideas to solve the problem and so I'm forced to stop my work. Can you give me some help to understand these messages? what can i do? thank you very much

quoideneuf commented 4 years ago

Hi @Marco60428 Those errors are stating that the wrong type of data is being inserted into the database. For instance, if a database column is defined as containing a 'double', or 'integer', you will get an error if you try to insert a string like 'abc'.

Marco60428 commented 4 years ago

Hi @quoideneuf, thank youI solved that problem. Now the last problem is: in the published article did not appeared the pdf files. I found this temporarely: in the xml import file i must adde : 1- at the beginnig: _<submissionfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" stage="proof" id="3" xsi:schemaLocation="http://pkp.sfu.ca native.xsd"> (but it seems to be sufficient _) 2- at the end:

1 PDF
<seq>0</seq>    <submission_file_ref id="3" revision="1" />

_ (In an issue xml file the situation is, obviously, more complicated)

A question: it is possible integrate these in your conversion procedure?

thanks in advance and happy christmas

Marco60428 commented 4 years ago

Hi @quoideneuf, please can you add the change i asked for in my last post? I ask you because the alternative would be to edit each file before the import..... if necessary I can better explain my request I hope you can answer. Thanks for your help best regards

ronste commented 1 year ago

Hi @asmecher, @quoideneuf,

since it was a (minor) topic at our PKP Sprint in Copenhagen and others might be interested too I post an update I created for the xsl transform code originally craeted by @quoideneuf.

I have to move a 2.4.8 journal to OJS 3.3 and updated the orignal repo to be compatible with my use case. That said, it is not a comprehencive revision of the xslt to be fully copmatible with OJS 3.3. E.g. I didn't touch the href tag because I don't have those in my journal.

I also fixed some file ID issues I encountered with the original xslt, i.e. in my fork there are now two xsl files for OJS 3.1 and 3.3, respectively.

In case the original repo is still maintained I am happy to do a PR.

Best wishes, Ronald