Closed Daniel-Mietchen closed 10 years ago
@Daniel-Mietchen the text is now up. https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central
issues on
@wrought jats-to-mediawiki didn't make a refernces section on this article @Daniel-Mietchen can you run OAMI for images on this article?
@wrought @Klortho The Olinguito article does not just miss a reference, but half of the article, apparently due to some issue with the header of table 3.
@notconfusing OAMI is not yet set up to import images, but most of the ones from this article are already on Commons, and I will do the rest manually once the text is complete.
@wrought @Klortho . Just want to make sure that I am doing this right. The output is a result of theses commands. Perhaps I am not doing it right, because I am feeding it the .nxml file? Am I supposed to give it all the numbered files? What's the difference? And also an error appears, that an externality is missing see:
notconfusing@eigenzorg:~/workspace/JATS-to-Mediawiki$ ls Zookeys_2013_Aug_15_\(324\)_1-83
license.txt ZooKeys-324-001-g010.jpg ZooKeys-324-001-g020.jpg
ZooKeys-324-001-g001.gif ZooKeys-324-001-g011.gif ZooKeys-324-001-g021.gif
ZooKeys-324-001-g001.jpg ZooKeys-324-001-g011.jpg ZooKeys-324-001-g021.jpg
ZooKeys-324-001-g002.gif ZooKeys-324-001-g012.gif ZooKeys-324-001-g022.gif
ZooKeys-324-001-g002.jpg ZooKeys-324-001-g012.jpg ZooKeys-324-001-g022.jpg
ZooKeys-324-001-g003.gif ZooKeys-324-001-g013.gif ZooKeys-324-001-g023.gif
ZooKeys-324-001-g003.jpg ZooKeys-324-001-g013.jpg ZooKeys-324-001-g023.jpg
ZooKeys-324-001-g004.gif ZooKeys-324-001-g014.gif ZooKeys-324-001-g024.gif
ZooKeys-324-001-g004.jpg ZooKeys-324-001-g014.jpg ZooKeys-324-001-g024.jpg
ZooKeys-324-001-g005.gif ZooKeys-324-001-g015.gif ZooKeys-324-001.nxml
ZooKeys-324-001-g005.jpg ZooKeys-324-001-g015.jpg ZooKeys-324-001.pdf
ZooKeys-324-001-g006.gif ZooKeys-324-001-g016.gif zookeys.324.5827-treatment1.xml
ZooKeys-324-001-g006.jpg ZooKeys-324-001-g016.jpg zookeys.324.5827-treatment2.xml
ZooKeys-324-001-g007.gif ZooKeys-324-001-g017.gif zookeys.324.5827-treatment3.xml
ZooKeys-324-001-g007.jpg ZooKeys-324-001-g017.jpg zookeys.324.5827-treatment4.xml
ZooKeys-324-001-g008.gif ZooKeys-324-001-g018.gif zookeys.324.5827-treatment5.xml
ZooKeys-324-001-g008.jpg ZooKeys-324-001-g018.jpg zookeys.324.5827-treatment6.xml
ZooKeys-324-001-g009.gif ZooKeys-324-001-g019.gif zookeys.324.5827-treatment7.xml
ZooKeys-324-001-g009.jpg ZooKeys-324-001-g019.jpg zookeys.324.5827-treatment8.xml
ZooKeys-324-001-g010.gif ZooKeys-324-001-g020.gif
notconfusing@eigenzorg:~/workspace/JATS-to-Mediawiki$ xsltproc jats-to-mediawiki.xsl Zookeys_2013_Aug_15_\(324\)_1-83/ZooKeys-324-001.nxml > ZooKeys-324-001.mw.xml
Zookeys_2013_Aug_15_(324)_1-83/ZooKeys-324-001.nxml:1: warning: failed to load external entity "Zookeys_2013_Aug_15_(324)_1-83/JATS-archivearticle1.dtd"
rnal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd"
This is solved, and yes that is the right way to convert the xml. The problem was python xml.ElemenTree.etree not handling <br/>
but now @wrought is converting those into newlines.
So @Daniel-Mietchen problematic article is fixed. Ready for you to upload the images and launch the RfC
We're making good progress here, but some details still remain to be addressed.
Please import the other articles from the test set in https://github.com/Daniel-Mietchen/OA-signalling/issues/37#issuecomment-42750689 as well.
I'll go through these too and launch the RfC once the majority of the bugs listed at https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central#Bugs are fixed.
I'll refrain from editing the articles' wiki pages manually, so as to avoid overwrites like https://en.wikisource.org/w/index.php?title=Wikisource%3AWikiProject_Open_Access%2FProgrammatic_import_from_PubMed_Central%2FThe_Vpr_protein_from_HIV-1%3A_distinct_roles_along_the_viral_life_cycle&diff=4899177&oldid=4894578 .
@Daniel-Mietchen the rest of the articles in #37 are up for your perural. We spot checked them, and are reporting some of those bugs. For instance: 10.1371/journal.pbio.0020207 displays citations, but does not get a Reflist. And then there are some more JATS to mediawiki problems, like breaking with complex elements in tables. Plase report the rest.
Posted as "proposal" on the Scriptorium: https://en.wikisource.org/w/index.php?title=Wikisource:Scriptorium&oldid=4925187#Automated_import_of_openly_licensed_scholarly_articles .
About mass-importing full-text OA articles into en.ws