pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 442 forks source link

old-style downloadSuppFile links broken on 3.3.0-11 #8004

Open wobsta opened 2 years ago

wobsta commented 2 years ago

When upgrading our journal instance from ojs 3.2.1-4 to 3.3.0-11, the old-style downloadSuppFile stop working.

To show an example of the problem, consider the two instances ojs.dpg-physik.de (3.2.1-4) and ojstest.dpg-physik.de (3.3.0-11). We are using basic auth "ojs" and "test" on the test instance to prevent crawlers from crawling the test instance.

  1. https://ojs.dpg-physik.de/index.php/phydid-b/article/view/881 and https://ojs:test@ojstest.dpg-physik.de/index.php/phydid-b/article/view/881 are the same article before and after the upgrade. The upgrade is successful without any errors.
  2. The supplement files are accessible in both cases.
  3. On the 3.2.1-4 instance you can also access the supplement files by the old links (used in references etc.), for example https://ojs.dpg-physik.de/index.php/phydid-b/article/downloadSuppFile/881/209
  4. On the 3.3.0-11 instance the same link returns an 404 not found error: https://ojs:test@ojstest.dpg-physik.de/index.php/phydid-b/article/downloadSuppFile/881/209

We would like to keep our old links functional. Any idea how to achieve that?

wobsta commented 2 years ago

I further investigated the issue and noticed, that when I dump the value of $submissionFile->getData('old-supp-id') in https://github.com/pkp/ojs/blob/stable-3_3_0/pages/article/ArticleHandler.inc.php#L381 it is empty for all files. (For the example given above there happen to be 11 files being iterated.)

When I do the same in https://github.com/pkp/ojs/blob/stable-3_2_1/pages/article/ArticleHandler.inc.php#L357 the loop contains again 11 files being checked, but four of them have a non-zero value for $submissionFile->getData('old-supp-id'), resulting in the old supplementary file links to function. So somehow the old-supp-id values are lost ...

asmecher commented 2 years ago

@wobsta, the old-supp-id was probably removed accidentally as part of an internal refactor of our storage code. However, we've been serving HTTP 301 (redirected permanently) responses for requests to downloadSuppFiles URLs since the OJS 3.0 was released in 2016; these were always intended to be temporary redirects and all recipients of HTTP 301 requests should have been updating those URLs as they went.

We haven't formally planned the removal of the downloadSuppFile function but with the 2.x to 3.x upgrade process removed as of 3.3.x, and the feature being broken since 3.3.0, now seems like a good time.

Are you seeing ongoing requests for downloadSuppFiles in cases where HTTP 301s should have been getting served for a prolonged period of time?

wobsta commented 2 years ago

I can confirm that our 3.2 instance returns a permanent redirect for the old-supp-id requests. However, this is just no help to later on remove the old-supp-id functionality, as those old urls happen to be used in other (also old existing) publications to reference supplementary files within our journal but also elsewhere.

I am curious, as the submission_file_settings table still contains the old-supp-id entries after the 3.3 migration (migrated from 3.2). I do understand, that you stop supporting the 2.x migration to 3.3, but you can do so with an intermediate step like by 3.2.1-4. By that you can reach exactly our situation in failing the old-supp-id requests. Maybe it is possible to add back supporting those urls?

asmecher commented 2 years ago

You should be able to stop the old-supp-id entries from getting deleted by modifying lib/pkp/schemas/submissionFile.json, restoring from pre-upgrade backup, and re-running the upgrade to 3.3.x:

diff --git a/schemas/submissionFile.json b/schemas/submissionFile.json
index b31e3121b..3efec27de 100644
--- a/schemas/submissionFile.json
+++ b/schemas/submissionFile.json
@@ -105,6 +105,12 @@
                        "type": "integer",
                        "apiSummary": true
                },
+               "old-supp-id": {
+                       "type": "integer",
+                       "description": "For supplementary files migrated from OJS 2.x, this field contains the old supp file ID. DEPRECATED.",
+                       "apiSummary": false,
+                       "readOnly": true
+               },
                "id": {
                        "type": "integer",
                        "apiSummary": true,

Can you confirm whether that works?

wobsta commented 2 years ago

Unfortunately the patch does not change the situation. I did apply it and rerun the migration of 3.2 to 3.3. The old-supp-id urls remain inoperable, and the database is not different with or without the patch. I may not properly apply the patch in the sense, that it does not influence the schema files (i.e. I am not rebuilding files like xml/schema/submissionFiles.xml – are they auto-generated – how?).

Please note that the old-supp-id occurs in submission_file_settings both on 3.2 and 3.3:

select count(*) from submission_file_settings where setting_name = 'old-supp-id';

It happens to returns 156 here, both on 3.2 and 3.3.

ctgraham commented 2 years ago

This is a bug in the code. Specifically, ArticleHandler::downloadSuppFile expects that the SubmissionFile will have a 'old-supp-id' property accessible via getData(): https://github.com/pkp/ojs/blob/e8b8c0292b133d564a9ddd9f41905ea5d124e90c/pages/article/ArticleHandler.inc.php#L381

but, SubmissionFile was instantiated by SubmissionFileService::getMany(): https://github.com/pkp/ojs/blob/e8b8c0292b133d564a9ddd9f41905ea5d124e90c/pages/article/ArticleHandler.inc.php#L377

which constructs the query without any reference to submission_file_settings: https://github.com/pkp/pkp-lib/blob/902900618f50bc266d9d1afd5c27f88ecb20da79/classes/services/PKPSubmissionFileService.inc.php#L54-L66

For this to work, some combination of the PKPSubmissionFileQueryBuilder::getQuery(), most SubmissionFileService methods, and the PKPSubmissionFileDAO::_fromRow() need to be aware of the submission_file_settings.

Tagging @wopsononock , who is currently looking at this bug.

ctgraham commented 2 years ago

N.b.: we plan to hotfix this temporarily in our instances by cloning PKPSubmissionFileDAO::getByPubId() to a PKPSubmissionFileDAO::getBySetting() and calling that from ArticleFileHandler::downloadSuppFile() until the full solution is spec'd, reviewed, and merged.

ctgraham commented 2 years ago

One more note:

We haven't formally planned the removal of the downloadSuppFile function but with the 2.x to 3.x upgrade process removed as of 3.3.x, and the feature being broken since 3.3.0, now seems like a good time.

Are you seeing ongoing requests for downloadSuppFiles in cases where HTTP 301s should have been getting served for a prolonged period of time?

Yes, we have links to the supplemental files via these legacy URLs in published PDF articles.

wobsta commented 1 year ago

As we want and need to update to 3.3.x soon, we need to decide on how to proceed with the legacy URLs for the supplemental files. Looks like we have several options:

  1. Breaking the legacy URLs and just live with that.
  2. Getting some solution within OJS along the line of what @ctgraham suggested already.
  3. Extracting the legacy URLs and mapping and doing the redirect on the reverse proxy in front of OJS.

As I do not want to go with option 1 and I can not help with option 2 (due to missing PHP knowledge), I am in favor of option 3 right now. We have nginx (and php-fpm) on our production system and could do so easily. We have 156 legacy URLs we could continue to support (also in the long term). But maybe somebody could help us in not going this “out of OJS” solution … thanks for consideration!

wobsta commented 6 months ago

I implemented option 3 using the following sql

select 'map $request_uri $new_uri {' as '';
select 'default "";' as '';
select concat('/index.php/phydid-b/article/downloadSuppFile/',
              submission_files.submission_id, '/', setting_value, ' ',
              '/index.php/phydid-b/article/download/',
              submission_files.submission_id, '/', galley_id, '/', submission_files.submission_file_id, ';') as ''
from submission_files, publication_galleys, submission_file_settings
where setting_name = 'old-supp-id' and
      submission_files.submission_file_id = submission_file_settings.submission_file_id and
      publication_galleys.submission_file_id = submission_file_settings.submission_file_id;
select '}' as '';

The result can be saved as a file and included in the nginx configuration to redirect matched urls by:

 if ($new_uri != "") { rewrite ^(.*)$ $new_uri permanent; }

You need to adjust the urls to your needs and might need a different redirect syntax (like for apache), but it solved my issue.

ctgraham commented 6 months ago

This issue should remain open until resolved in an upcoming release.