pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
307 stars 447 forks source link

[OMP] Error in OAI ListRecords with previously published Submission #2033

Closed nils-stefan-weiher closed 7 years ago

nils-stefan-weiher commented 7 years ago

Dear PKP Team,

I found an issue with OAI and ListRecords if there exists a Submission which was previously published but later removed from the catalog, but with the publication formats still marked as "available". I tracked the issue down to a query for the OAI records, the publication formats of the submission will be returned twice, one time from the tombstone entries marking the deletion and another time form the available publication formats, because the query does not correctly deal with a submission which was published and then retracted. A similar issue was mentioned here, but never completely resolved, because publishing the submission in the catalog is only a workaround: http://forum.pkp.sfu.ca/t/omp-error-when-been-harvested-using-oai/7424/9

This is the error message from the logs: PHP Fatal error: Call to a member function getPressId() on a non-object in /web/omp/plugins/metadata/dc11/filter/Dc11 SchemaPublicationFormatAdapter.inc.php on line 76

But the actual issue is because of a missing case in the query that retrieves OAI records. I tracked it down to the OAIDOA, which lists the publications format of the retracted submission twice using this query: https://github.com/pkp/omp/blob/master/classes/oai/omp/OAIDAO.inc.php#L192-L198

I think it is because during "unpublishing" of a submission, the entry in the "published_submissions" table wont get deleted and only the date_published will be set to null. Maybe the issue can be resolved by checking if "date_published" is not null for the submission which the publication formats belong to. Or the publication formats will have to be set to unavailable if the submission is removed from the catalog.

Thanks for your great work, hope the information helps.

Regards, Nils Weiher

asmecher commented 7 years ago

Thanks, @isgrim -- I agree with your diagnosis. I was hoping to get a DB dump from another user with the same issue (http://forum.pkp.sfu.ca/t/omp-oai-pmh-problems/22458/4) but let me try to come up with a patch and if you're able to test it that would be very helpful.

asmecher commented 7 years ago

@isgrim, the quickest solution to this appears to be to make a few previously-optional (LEFT) joins mandatory; see https://github.com/pkp/omp/commit/6067facae4265d621c223e512d9c35908375251a for details. Could you give this a try it and report back whether it works as expected?

nils-stefan-weiher commented 7 years ago

@asmecher Thanks Alec, I will give this a try!

nils-stefan-weiher commented 7 years ago

@asmecher Sorry, this did not help. If the publication format for the unpublished submission is still marked as available, the fatal error during ListRecords still occurs with these changes. If I make the publication format unavailable the ListRecords query is successful but does not contain any tombstones for deleted records.

I checked which query was actually executed, using database debug output from OMP, and ran this query manually on our database.

This was the original query ListRecords triggered, from before your changes:

SELECT COALESCE(dot.date_deleted, ms.last_modified) AS last_modified,
       COALESCE(pf.publication_format_id, dot.data_object_id) AS data_object_id,
       COALESCE(p.press_id, tsop.assoc_id) AS press_id,
       COALESCE(tsos.assoc_id, s.series_id) AS series_id,
       dot.tombstone_id,
       dot.set_spec,
       dot.oai_identifier
FROM mutex m LEFT JOIN publication_formats pf ON (m.i=0)
LEFT JOIN published_submissions ps ON (ps.submission_id = pf.submission_id)
LEFT JOIN submissions ms ON (ms.submission_id = ps.submission_id AND ms.context_id = 6)
LEFT JOIN series s ON (s.series_id = ms.series_id)
LEFT JOIN presses p ON (p.press_id = ms.context_id)
LEFT JOIN data_object_tombstones dot ON (m.i = 1)
LEFT JOIN data_object_tombstone_oai_set_objects tsop ON (tsop.tombstone_id = dot.tombstone_id AND tsop.assoc_type = 512 AND tsop.assoc_id = 6)
LEFT JOIN data_object_tombstone_oai_set_objects tsos ON tsos.assoc_id = null
WHERE ((p.enabled = 1 AND ms.status <> 4 AND pf.is_available = 1) OR dot.data_object_id IS NOT NULL) ORDER BY press_id

And these were the results, here as CSV:

last_modified,data_object_id,press_id,series_id,tombstone_id,set_spec,oai_identifier
"2016-06-10 14:00:58",101,NULL,NULL,60,propylaeum:daidalos,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/101
"2016-02-11 10:02:00",48,NULL,NULL,15,heibooks,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/48
"2016-06-10 14:21:49",94,NULL,NULL,67,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/94
"2016-07-19 14:40:52",159,NULL,NULL,150,heibooks:universitaetsmuseum,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/159
"2016-05-23 15:38:36",77,NULL,NULL,47,arthistoricum:dbae,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/77
"2016-06-10 14:04:22",104,NULL,NULL,61,propylaeum:daidalos,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/104
"2016-09-15 09:51:22",205,NULL,NULL,234,xa:gsa,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/205
"2016-02-17 17:03:00",52,NULL,NULL,16,xa:gsa,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/52
"2016-06-10 14:25:03",95,NULL,NULL,68,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/95
"2016-05-23 15:45:47",73,NULL,NULL,48,arthistoricum:vmps,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/73
"2016-06-10 14:06:54",103,NULL,NULL,62,propylaeum:daidalos,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/103
"2016-02-17 17:03:04",51,NULL,NULL,17,xa:gsa,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/51
"2016-06-06 15:46:47",118,NULL,NULL,56,propylaeum:cms,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/118
"2016-06-10 14:26:52",96,NULL,NULL,69,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/96
"2016-11-14 16:27:41",228,NULL,NULL,330,heibooks:ubhd_schriften,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/228
"2016-05-23 15:48:07",74,NULL,NULL,49,arthistoricum:vmps,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/74
"2014-03-28 09:20:32",2,NULL,NULL,1,propylaeum:Propylaeum,oai:omp.books.ub.uni-heidelberg.de:publicationFormat/2
"2016-06-10 14:08:52",102,NULL,NULL,63,propylaeum:daidalos,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/102
"2016-06-30 12:17:51",138,NULL,NULL,117,xa:gsa,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/138
"2016-02-18 10:17:34",53,NULL,NULL,18,heibooks:universitaetsmuseum,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/53
"2016-06-10 13:46:14",99,NULL,NULL,57,propylaeum:cms,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/99
"2016-06-10 14:28:40",97,NULL,NULL,70,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/97
"2016-05-24 16:30:48",89,NULL,NULL,50,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/89
"2014-09-17 13:14:59",10,NULL,NULL,2,propylaeum:Propylaeum,oai:omp.books.ub.uni-heidelberg.de:publicationFormat/10
"2016-06-10 14:11:45",100,NULL,NULL,64,propylaeum:daidalos,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/100
"2016-06-14 16:13:59",85,NULL,NULL,74,propylaeum,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/85
"2016-06-10 13:49:02",87,NULL,NULL,58,propylaeum:cms,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/87
"2016-06-10 14:34:09",98,NULL,NULL,71,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/98
"2015-10-13 15:17:07",41,NULL,NULL,9,propylaeum:arch-ber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/41
"2016-05-25 14:33:13",90,NULL,NULL,51,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/90
"2014-09-22 16:01:15",13,NULL,NULL,3,propylaeum:Propylaeum,oai:omp.books.ub.uni-heidelberg.de:publicationFormat/13
"2016-06-10 14:17:01",92,NULL,NULL,65,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/92
"2016-05-23 15:07:24",83,NULL,NULL,46,arthistoricum,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/83
"2016-06-10 13:52:01",109,NULL,NULL,59,propylaeum:cms,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/109
"2016-06-10 14:43:20",105,NULL,NULL,72,propylaeum:saa,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/105
"2016-05-25 14:39:50",91,NULL,NULL,52,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/91
"2014-11-13 14:28:07",17,NULL,NULL,4,propylaeum:arch-ber,oai:omp.books.ub.uni-heidelberg.de:publicationFormat/17
"2016-06-10 14:19:17",93,NULL,NULL,66,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/93
"2016-07-19 14:40:52",158,NULL,NULL,149,heibooks:universitaetsmuseum,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/158
"2016-03-17 15:38:16",68,NULL,NULL,29,propylaeum:archber,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/68
"2016-06-24 09:27:04",199,6,NULL,NULL,NULL,NULL
"2016-10-19 08:02:59",203,6,NULL,296,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/203
"2015-06-30 10:35:05",34,6,NULL,NULL,NULL,NULL
"2016-05-31 09:27:01",111,6,NULL,53,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/111
"2016-07-01 10:40:10",149,6,NULL,NULL,NULL,NULL
"2015-07-10 14:32:16",33,6,NULL,5,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/33
"2015-06-30 10:35:05",123,6,NULL,NULL,NULL,NULL
"2016-11-25 12:35:45",230,6,19,NULL,NULL,NULL
"2015-12-08 09:22:35",42,6,13,NULL,NULL,NULL
"2016-06-02 16:13:36",113,6,NULL,55,heiup:hst,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/113
"2016-07-01 11:22:35",151,6,NULL,NULL,NULL,NULL
"2015-10-05 10:25:22",35,6,NULL,6,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/35
"2016-09-20 16:43:48",211,6,NULL,242,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/211
"2015-12-08 09:22:35",115,6,13,NULL,NULL,NULL
"2016-11-25 12:35:45",231,6,19,NULL,NULL,NULL
"2016-10-19 08:02:59",202,6,NULL,295,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/202
"2015-12-08 09:22:35",44,6,13,NULL,NULL,NULL
"2016-07-01 11:22:35",152,6,NULL,NULL,NULL,NULL
"2015-10-05 14:55:36",36,6,NULL,7,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/36
"2016-05-12 10:11:38",106,6,NULL,NULL,NULL,NULL
"2015-12-08 09:22:35",45,6,13,NULL,NULL,NULL
"2016-07-01 11:22:35",188,6,NULL,NULL,NULL,NULL
"2015-10-09 14:56:03",37,6,NULL,8,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/37
"2016-11-16 14:29:24",150,6,NULL,343,heiup,oai:omp.ub.uni-heidelberg.de:publicationFormat/150
"2016-05-12 10:11:38",107,6,NULL,NULL,NULL,NULL
"2016-11-29 08:48:59",149,6,NULL,350,heiup,oai:omp.ub.uni-heidelberg.de:publicationFormat/149
"2015-06-30 10:35:05",122,6,NULL,NULL,NULL,NULL
"2016-07-01 11:22:35",191,6,NULL,NULL,NULL,NULL
"2016-06-24 09:27:04",128,6,NULL,NULL,NULL,NULL
"2016-11-10 11:47:17",223,6,NULL,316,heiup:hst,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/223
"2016-06-24 09:27:04",129,6,NULL,NULL,NULL,NULL
"2016-05-12 10:11:38",194,6,NULL,NULL,NULL,NULL
"2015-12-08 09:59:58",43,6,NULL,10,heiup,oai:omp.archiv.ub.uni-heidelberg.de:publicationFormat/43
"2015-06-30 10:35:05",32,6,NULL,NULL,NULL,NULL
"2016-05-12 10:11:38",119,6,NULL,NULL,NULL,NULL
"2016-06-24 09:27:04",127,6,NULL,NULL,NULL,NULL

With the new changes the results were:

last_modified,data_object_id,press_id,series_id,tombstone_id,set_spec,oai_identifier
"2015-12-08 09:22:35",42,6,13,NULL,NULL,NULL
"2016-11-25 12:35:45",230,6,19,NULL,NULL,NULL
"2016-05-12 10:11:38",119,6,NULL,NULL,NULL,NULL
"2016-06-24 09:27:04",199,6,NULL,NULL,NULL,NULL
"2015-12-08 09:22:35",44,6,13,NULL,NULL,NULL
"2016-11-25 12:35:45",231,6,19,NULL,NULL,NULL
"2016-05-12 10:11:38",194,6,NULL,NULL,NULL,NULL
"2016-07-01 10:40:10",149,6,NULL,NULL,NULL,NULL
"2015-12-08 09:22:35",45,6,13,NULL,NULL,NULL
"2015-06-30 10:35:05",32,6,NULL,NULL,NULL,NULL
"2016-07-01 11:22:35",151,6,NULL,NULL,NULL,NULL
"2015-12-08 09:22:35",115,6,13,NULL,NULL,NULL
"2015-06-30 10:35:05",34,6,NULL,NULL,NULL,NULL
"2016-07-01 11:22:35",152,6,NULL,NULL,NULL,NULL
"2016-06-24 09:27:04",129,6,NULL,NULL,NULL,NULL
"2015-06-30 10:35:05",122,6,NULL,NULL,NULL,NULL
"2016-07-01 11:22:35",188,6,NULL,NULL,NULL,NULL
"2016-05-12 10:11:38",106,6,NULL,NULL,NULL,NULL
"2016-06-24 09:27:04",127,6,NULL,NULL,NULL,NULL
"2015-06-30 10:35:05",123,6,NULL,NULL,NULL,NULL
"2016-07-01 11:22:35",191,6,NULL,NULL,NULL,NULL
"2016-05-12 10:11:38",107,6,NULL,NULL,NULL,NULL
"2016-06-24 09:27:04",128,6,NULL,NULL,NULL,NULL

The row with data_object_id = 149 is from a publication format from the previously published but then retracted submission (submission_id = 122), this should not be returned as it will later lead to a fatal error, because the later query for the published submission data, returns no results:

SELECT  s.*,
        ps.*,
        COALESCE(stl.setting_value, stpl.setting_value) AS series_title,
        COALESCE(sal.setting_value, sapl.setting_value) AS series_abbrev
FROM    submissions s
    JOIN published_submissions ps ON (ps.submission_id = s.submission_id)
    LEFT JOIN series se ON se.series_id = s.series_id
    LEFT JOIN series_settings stpl ON (se.series_id = stpl.series_id AND stpl.setting_name = 'title' AND stpl.locale = 'en_US')
    LEFT JOIN series_settings stl ON (se.series_id = stl.series_id AND stl.setting_name = 'title' AND stl.locale = 'en_US')
    LEFT JOIN series_settings sapl ON (se.series_id = sapl.series_id AND sapl.setting_name = 'abbrev' AND sapl.locale = 'en_US')
    LEFT JOIN series_settings sal ON (se.series_id = sal.series_id AND sal.setting_name = 'abbrev' AND sal.locale = 'en_US')
WHERE   s.submission_id = 122
AND ps.date_published IS NOT NULL  
asmecher commented 7 years ago

Ah, @isgrim, it looks like one more SQL change was required -- try adding https://github.com/asmecher/omp/commit/ebcc8e127ccde54caa56a653822624e48c816995 as well. I've reproduced the problem locally and this additional change appears to get it.

asmecher commented 7 years ago

(Note to self: NOT MERGED pending feedback/testing.)

nils-stefan-weiher commented 7 years ago

Thanks @asmecher . That change seems to have fixed the error. Do you think this fix is safe to use in a production evironment or should we wait a littler longer until more feedback comes?

asmecher commented 7 years ago

I've tested it locally, and your feedback is a good additional check. It's a small, uncomplicated change. Long story short: you're safe to use it in production, and I'll merge it into our git repos for release in whatever OMP comes next. Thanks!