pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 443 forks source link

[OJS] 3.0.0 XML export validation message encoding #1956

Closed knjigor closed 7 years ago

knjigor commented 7 years ago

Refer to http://forum.pkp.sfu.ca/t/ojs-3-crossref-error/23770/14?u=knjigor

bozana commented 7 years ago

@knjigor, unfortunatelly I cannot reproduce it :-( Could you maybe use browser developer tools (or Firebug for Firefox), and check the Response header when you click on the button "Download XML" (in those developer tools under "Network" and then when you open/click on the POST request), to see the "Content-Type" and "charset" there? -- You will have to open the tools and "Network" menu before you click on "Download XML". Or, if you could insert this code in the function exportXML in the class classes/plugins/PubObjectsExportPlugin.inx.php:

if (!empty($errors)) {
$charset = Config::getVar('i18n', 'client_charset');
header('Content-type: text/html; charset=' . $charset);
echo '<html><body>';
$this->displayXMLValidationErrors($errors, $xml);
fatalError(__('plugins.importexport.common.error.validation'));
echo '</body></html>';
}

instead of these lines currently there: https://github.com/pkp/ojs/blob/ojs-stable-3_0_0/classes/plugins/PubObjectsExportPlugin.inc.php#L279-L282.

This could eventually solve the problem -- currently the server does not return the header in case of those validation errors, thus maybe your browser is "thinking" that it is some other charset, but... Thanks a lot!

knjigor commented 7 years ago

Hi Bozana Here are results: Firebug:

Connection  
Keep-Alive
Content-Encoding    
gzip
Content-Length  
2835
Content-Type    
text/html
Date    
Mon, 07 Nov 2016 11:18:04 GMT
Keep-Alive  
timeout=5, max=100
Server  
Apache/2.4.7 (Ubuntu)
Set-Cookie  
OJSSID=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/
OJSSID=gscr0h5be0a7s8orqrted755a1; expires=Wed, 07-Dec-2016 11:18:04 GMT; Max-Age=2592000; path=/; domain
=godisnjak.ff.uns.ac.rs
Vary    
Accept-Encoding
X-Powered-By    
PHP/5.5.9-1ubuntu4.20
view source
Accept  
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding 
gzip, deflate
Accept-Language 
en-US,en;q=0.5
Cache-Control   
max-age=0
Connection  
keep-alive
Cookie  
_ga=GA1.3.1847372170.1476880530; tk_ni=112017260; OJSSID=gscr0h5be0a7s8orqrted755a1; __utma=18102319
.1847372170.1476880530.1478263793.1478263793.1; __utmz=18102319.1478263793.1.1.utmcsr=(direct)|utmccn
=(direct)|utmcmd=(none); __atuvc=4%7C44; _gat=1
Host    
godisnjak.ff.uns.ac.rs
Referer 
http://godisnjak.ff.uns.ac.rs/index.php/gff/management/importexport/plugin/CrossRefExportPlugin
Upgrade-Insecure-Requests   
1
User-Agent  
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
knjigor commented 7 years ago

Changing code in classes/plugins/PubObjectsExportPlugin.inx.php has fix error in encoding, but we still have error in validation and converting objects, I think that this is network (proxy) related:

Validation errors:

Failed to locate the main schema resource at 'http://www.crossref.org/schema/deposit/crossref4.3.6.xsd'.
Invalid XML:

<?xml version="1.0" encoding="utf-8"?>
<doi_batch xmlns="http://www.crossref.org/schema/4.3.6" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" xmlns:ai="http://www.crossref.org/AccessIndicators.xsd" version="4.3.6" xsi:schemaLocation="http://www.crossref.org/schema/4.3.6 http://www.crossref.org/schema/deposit/crossref4.3.6.xsd">
  <head>
    <doi_batch_id>_1478518041</doi_batch_id>
    <timestamp>1478518041</timestamp>
    <depositor>
      <depositor_name>Igor Lekić</depositor_name>
      <email_address>crossref@ff.uns.ac.rs</email_address>
    </depositor>
    <registrant>ФИЛОЗОФСКИ ФАКУЛТЕТ НОВИ САД</registrant>
  </head>
  <body>
    <journal>
      <journal_metadata>
        <full_title>Годишњак Филозофског факултета у Новом Саду</full_title>
        <abbrev_title>gff</abbrev_title>
        <issn media_type="electronic">2334-7236</issn>
        <issn media_type="print">0374-0730</issn>
      </journal_metadata>
      <journal_issue>
        <publication_date media_type="online">
          <month>12</month>
          <day>10</day>
          <year>2015</year>
        </publication_date>
        <journal_volume>
          <volume>40</volume>
        </journal_volume>
        <issue>1</issue>
        <doi_data>
          <doi>10.19090/gff.2015.1</doi>
          <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/issue/view/115</resource>
        </doi_data>
      </journal_issue>
      <journal_article xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" publication_type="full_text" metadata_distribution_opts="any">
        <titles>
          <title>KRITICKÉ ČÍTANIE LITERATÚRY VÍŤAZOSLAVA HRONCA</title>
        </titles>
        <contributors>
          <person_name contributor_role="author" sequence="first">
            <given_name>Adam</given_name>
            <surname>Svetlík</surname>
          </person_name>
        </contributors>
        <jats:abstract xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1">
          <jats:p>V texte sa analyzuje kritické čítanie literatúry Víťazoslava Hronca, ktoré zaberá dôležité miesto nielen v jeho celkovej literárnej činnosti, ale aj v literatúre vojvodinských Slovákov. Prvé literárnokritické intepretácie aktuálnej súdobej slovenskej vojvodinskej básnickej tvorby Hronec publikoval v 60. rokoch 20. storočia a odvtedy prakticky dodnes sa svojím kritickým čítaním literatúry usiluje usmerňovať a skvalitňovať celkovú literárnu tvorbu vojvodinských Slovákov. Dôležité sú zvlášť jeho antológie slovenskej vojvodinskej poézie a obsiahle literárnohistorické predslovy k ním, ktoré možno čítať ako svojrázne dejiny slovenskej vojvodinskej poezie. Základné metodologické východisko Víťazoslava Hronca bolo zo začiatku inšpirované fenomenológiou, no neskôr, najmä v postmodernom tvorivom období na prelome storočia, jeho kritické čítanie literatúry poznačil esejistický výraz a široký kulturologický zorný uhol, v ktorom naplno prišlo k slovu autorova výnimočná erudícia a sluch pre tep doby.</jats:p>
        </jats:abstract>
        <publication_date media_type="online">
          <month>12</month>
          <day>10</day>
          <year>2015</year>
        </publication_date>
        <pages>
          <first_page>207</first_page>
          <other_pages>222</other_pages>
        </pages>
        <doi_data>
          <doi>10.19090/gff.2015.1.207-222</doi>
          <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/article/view/1501</resource>
          <collection property="crawler-based">
            <item crawler="iParadigms">
              <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/article/download/1501/1528</resource>
            </item>
          </collection>
          <collection property="text-mining">
            <item>
              <resource mime_type="application/pdf">http://godisnjak.ff.uns.ac.rs/index.php/gff/article/download/1501/1528</resource>
            </item>
          </collection>
        </doi_data>
      </journal_article>
    </journal>
    <journal>
      <journal_metadata>
        <full_title>Годишњак Филозофског факултета у Новом Саду</full_title>
        <abbrev_title>gff</abbrev_title>
        <issn media_type="electronic">2334-7236</issn>
        <issn media_type="print">0374-0730</issn>
      </journal_metadata>
      <journal_issue>
        <publication_date media_type="online">
          <month>12</month>
          <day>10</day>
          <year>2015</year>
        </publication_date>
        <journal_volume>
          <volume>40</volume>
        </journal_volume>
        <issue>1</issue>
        <doi_data>
          <doi>10.19090/gff.2015.1</doi>
          <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/issue/view/115</resource>
        </doi_data>
      </journal_issue>
      <journal_article xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" publication_type="full_text" metadata_distribution_opts="any">
        <titles>
          <title>DIREKTNA PARCIJALNA PITANJA IZVEDENA POMOĆU PROSTIH UPITNIH ZAMENICA U FRANCUSKOM JEZIKU I NJIHOVI EKVIVALENTI U SRPSKOM</title>
        </titles>
        <contributors>
          <person_name contributor_role="author" sequence="first">
            <given_name>Nataša</given_name>
            <surname>Radusin-Bardić</surname>
          </person_name>
        </contributors>
        <jats:abstract xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1">
          <jats:p>U francuskom jeziku, direktna parcijalna pitanja izvedena pomoću prostih upitnih zamenica podrazumevaju složen sistem u okviru koga su mnogi elementi podložni varijaciji. Pre svega, odabir prostih upitnih zamenica u konkretnom upitnom iskazu zavisi od toga da li se odnosi na bića (qui) ili na stvari i pojmove (que/qu’, quoi). Kada se odnose na stvari i pojmove, proste upitne zamenice, koje spadaju u nepromenljivu vrstu reči, imaju svoj nenaglašen (que/qu’) i naglašen oblik (quoi), a njihova upotreba zavisi od pozicije proste upitne zamenice u upitnom iskazu, kao i od toga da li je ona upotrebljena samostalno ili ne, uz predlog ili bez njega. Između ostalog, u zavisnosti od funkcije upitne zamenice u rečenici, često postoje dvojni upitni oblici: neprošireni (koji zahtevaju upotrebu inverzije osim ako upitna reč ima funkciju subjekta) i prošireni pomoću upitnog izraza est-ce que, est-ce qui (koji zahtevaju isti red reči kao u izjavnoj rečenici). Najzad, u govornom jeziku, upotreba direktnih parcijalnih pitanja izvedenih pomoću prostih upitnih zamenica pokazuje strukturnu varijabilnost koja nosi različita stilska obeležja. U našem radu, oslanjajući se na relevantnu normativnu i deskriptivnu literaturu, kao i na analizu korpusa, nastojaćemo da utvrdimo koji su oblici, u srpskom jeziku, ekvivalentni navedenim direktnim parcijalnim pitanjima.</jats:p>
        </jats:abstract>
        <publication_date media_type="online">
          <month>12</month>
          <day>10</day>
          <year>2015</year>
        </publication_date>
        <pages>
          <first_page>183</first_page>
          <other_pages>206</other_pages>
        </pages>
        <doi_data>
          <doi>10.19090/gff.2015.1.183-206</doi>
          <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/article/view/1500</resource>
          <collection property="crawler-based">
            <item crawler="iParadigms">
              <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/article/download/1500/1527</resource>
            </item>
          </collection>
          <collection property="text-mining">
            <item>
              <resource mime_type="application/pdf">http://godisnjak.ff.uns.ac.rs/index.php/gff/article/download/1500/1527</resource>
            </item>
          </collection>
        </doi_data>
      </journal_article>
    </journal>
    <journal>
      <journal_metadata>
        <full_title>Годишњак Филозофског факултета у Новом Саду</full_title>
        <abbrev_title>gff</abbrev_title>
        <issn media_type="electronic">2334-7236</issn>
        <issn media_type="print">0374-0730</issn>
      </journal_metadata>
      <journal_issue>
        <publication_date media_type="online">
          <month>12</month>
          <day>10</day>
          <year>2015</year>
        </publication_date>
        <journal_volume>
          <volume>40</volume>
        </journal_volume>
        <issue>1</issue>
        <doi_data>
          <doi>10.19090/gff.2015.1</doi>
          <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/issue/view/115</resource>
        </doi_data>
      </journal_issue>
      <journal_article xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" publication_type="full_text" metadata_distribution_opts="any">
        <titles>
          <title>ТРИОЛЕТИ МИЛОСАВА ТЕШИЋА</title>
        </titles>
        <contributors>
          <person_name contributor_role="author" sequence="first">
            <given_name>Сања</given_name>
            <surname>Париповић Крчмар</surname>
          </person_name>
        </contributors>
        <jats:abstract xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1">
          <jats:p>У раду се анализују песме Милосава Тешића написане у облику триолета из збирке Прелест севера, Круг рачански, Дунавом и Бубњалица у пчелињаку у односу на тради- цијом утврђену норму сталног облика и невелику групу триолета написаних у српској књижевности 19. века. Његова скупина триолета није пример давнашње теоријске ре- флексије о лаком, допадљивом, шаљивом садржају, већ, пре свега, саображености по- етичком начелу. Ове песме чине једну компактну групу вишеструко компатибилну: тематски, обликовно, метрички, ритмички, римовно, синтаксичком организацијом, што све скупа, песнику усложњава задатак успешне реализације сталног облика ионако захтевних композиционих правила.</jats:p>
        </jats:abstract>
        <publication_date media_type="online">
          <month>12</month>
          <day>10</day>
          <year>2015</year>
        </publication_date>
        <pages>
          <first_page>175</first_page>
          <other_pages>182</other_pages>
        </pages>
        <doi_data>
          <doi>10.19090/gff.2015.1.175-182</doi>
          <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/article/view/1499</resource>
          <collection property="crawler-based">
            <item crawler="iParadigms">
              <resource>http://godisnjak.ff.uns.ac.rs/index.php/gff/article/download/1499/1526</resource>
            </item>
          </collection>
          <collection property="text-mining">
            <item>
              <resource mime_type="application/pdf">http://godisnjak.ff.uns.ac.rs/index.php/gff/article/download/1499/1526</resource>
            </item>
          </collection>
        </doi_data>
      </journal_article>
    </journal>
  </body>
</doi_batch>

Could not convert selected objects.
bozana commented 7 years ago

@knjigor, would it be possible for you to test this: Could you comment out this line https://github.com/pkp/pkp-lib/blob/master/classes/xslt/XMLTypeDescription.inc.php#L126 and then click on the button "Download XML"? -- This should display you the output/final XML file without those validation messages. When you then look at the page source code, is the character encoding correct?

knjigor commented 7 years ago

@bozana Yes encoding is correct and I get proper xml, only "error" that I get is This XML file does not appear to have any style information associated with it. The document tree is shown below. every thing else is ok.

knjigor commented 7 years ago

I have tried to deposit DOI for one article to Crossref and I get notification Registration successful!, I have checked on Crossref admin site and deposit is received successfully. Thank You very much for assistance.

bozana commented 7 years ago

@knjigor, thank you for testing and reporting!!! I will then merge the change for encoding problem, when validation errors appear. The line for schema validation should actually not be commented out -- this was only a test to double check that XML character encoding is working fine. I still have to figure out how to resolve that validation error :-( but will let you know as soon as I/we have something... Till then maybe you can leave it commented out, but maybe validate your XML manually before submitting hem to Crossref? -- It is a little bit more work, but it will ensure the XML can be deposited at Crossref after the submission. Thanks again!

knjigor commented 7 years ago

I will leave it commented. I have tested submitting in test environment (test api), and it worked without errors, unfortunately I already had to submit DOI's to Crossref because this is a live journal. After submitting I have check Crossref admin and it all went well.

bozana commented 7 years ago

OK, thanks! I hope we can soon figure out the solution for the other problem...

bozana commented 7 years ago

PRs: master: https://github.com/pkp/ojs/pull/1088 ojs-stable-3_0_0: https://github.com/pkp/ojs/pull/1089