pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
306 stars 445 forks source link

[OMP][OJS] remote galleys are not tracked or logged #1314

Closed crism closed 2 years ago

crism commented 8 years ago

A revisit, it seems, to #339…

The new remote content capability (#1123) just creates a link to the remote content; the Web browser never visits the OMP/OJS server to indicate whether that link was followed.

Instead, the link should be back to the server, like any other galley link, but if remote, the server should log it, then send a 302 response to send the browser off in the right direction.

NateWr commented 8 years ago

A 302 response could be sending the wrong message since it's actually the permanent location of the galley.

bozana commented 8 years ago

Here just a note (for consideration, when working on this issue), that this change would eventually mean a change in other parts of the system, how remote representations are handled (e.g. in public identifiers and export plugins)

bozana commented 8 years ago

@crism, the behavior changed in OJS, so that the remote galleys first call .../article/view/articleId/galleyId... thus is this issue still relevant for OJS?

crism commented 8 years ago

@bozana, if that’s true for OJS, then that suffices; in OMP, it’s still a direct link to the external resource. I haven’t tested OJS yet.

bozana commented 8 years ago

@crism, I checked it, unfortunately a few more changes are needed in order to log the remote galley usage event in OJS, s. https://github.com/bozana/ojs/commit/5bf616472a70d364b903eab5f7da93d0fdff06db. Unfortunately I am not sure how this changes would affect the statistics -- e.g. I am using the ASSOC_TYPE_GALLEY for remote galleys (as in OJS 2.4.x) and this type was else changed in master into ASSOC_TYPE_SUBMISSION_FILE for other type of galleys. Hmmm.... I am not sure how to proceed... I'll see if I could test and understand it...

bozana commented 8 years ago

I have a problem with this issue :-( Earlier we were using galley assoc type, but in OJS 3 just submission file assoc type is used, which requires also the format of the file. So I don't know how to proceed correctly with remote galleys, because we don't know the format (and if it is fully downloaded -- if this means something for this case and COUNTER). Does anybody have an idea? @asmecher, @ctgraham?

asmecher commented 8 years ago

@bozana, can you describe what you mean about file types? I'm not following. But it is a Friday...

bozana commented 8 years ago

Ah, sorry, I'll try to explain it more clear: In OJS 2.4.x the galleys were tracked as ASSOC_TYPE_GALLEY and in OJS 3.0 as and only as ASSOC_TYPE_SUBMISSION_FILE, which requires a file to be there, s. https://github.com/pkp/ojs/blob/master/plugins/generic/usageStats/UsageStatsLoader.inc.php#L61-L77. Which also makes sense, except for the remote galleys :-( Also, when building the usage event, the size and mime type is needed, s. https://github.com/pkp/pkp-lib/blob/master/plugins/generic/usageEvent/PKPUsageEventPlugin.inc.php#L181-L191, but this is maybe less important. Thus, I suppose we cannot use ASSOC_TYPE_SUBMISSION_FILE for remote galleys, as for all other normal galleys. We could introduce the ASSOC_TYPE_REPRESENTATION for remote galleys again, in the same way it was in OJS 2.4.x, but this type is not supported/considered any more in the reports, e.g. in https://github.com/pkp/ojs/blob/master/plugins/generic/usageStats/UsageStatsReportPlugin.inc.php (s. e.g. function getObjectTypes and getDefaultReportTemplates). Thus, maybe this type should also be considered in the reports again? And now that I am thinking about it all, the ASSOC_TYPE_REPRESENTATIONs, that are already in the DB, from OJS 2.4.x, are not considered any more for reports/statistics, which will make the numbers pretty different :-( Thus, those should be migrated to be ASSOC_TYPE_SUBMISSION_FILE, to fit into the new logic -- I didn't realized it, s. last point/item here https://github.com/pkp/pkp-lib/issues/1840. The question how to consider/treat the remote galleys is however to be solved. Is this now a little bit more clear? And do you have any idea how to solve the remote galleys? :-)

ctgraham commented 8 years ago

I'm going to set aside the core question of how to internally count remote galley usage to focus on COUNTER's considerations.

COUNTER is concerned (for now at least) with discriminating between the format of the full text article download (pdf vs, html vs. other), but I'm not sure COUNTER would want to include clickthroughs of remote links as full-text downloads. We can't authoritatively say whether the remote link click was a successful download. The download (or not) of the file would ideally be reported by a COUNTER-compliant report on the remote site. On the other hand, the COUNTER Code of Practice does allow for the possibility of "report[ing] on events that do not involve a request to the web server", so maybe good-faith counting of off-server access might be ok?

Question: within OJS 3.x, are we still distinguishing between the "full text" of an article and the "supplementary material" for an article? Or is an article now just a set of submission files? If these are not distinguished, that has implications for COUNTER. If these are distinguished, are remote galleys distinguished along the same lines?

asmecher commented 8 years ago

Remote galleys are not distinguished as primary/secondary (article/supplementary); we can still distinguish files that way.

Am I crazy to suggest that we shouldn't bend over backwards to make statistics behave well with externally-hosted galleys? Good-faith counting may be enough. And I'm not sure how OJS can be expected to know MIME types and file sizes authoritatively; we could perhaps fetch/store something in submission_galley_files, but obviously that data could become outdated without OJS ever knowing. I don't know what the COUNTER requirements for any of this are, though.

ctgraham commented 8 years ago

I can get feedback from Project COUNTER, but the more I think about it the less convinced I am that COUNTER would want to include statistics for these remote files. Since OJS can't control the externally hosted materials, we really can't consider these to be a full member of the published article, can we? Aren't we really describing here relationships between the published article and another material? It seems like a remote galley should be neither primary or secondary material, but simply related material, though I'm sure some instances are using the link for hosting the primary full text.

asmecher commented 8 years ago

For users who are hosting externally but do have control over their hosting environment and want to unify COUNTER stats, how much of a stretch would it be for them to e.g. merge server access logs for the two platforms and process them together? (Suspicion: not easy. But asking anyway.)

bozana commented 8 years ago

I agree with you both to leave the counting of the remote galleys to the remote system. It sounds logical to me -- OJS would then just be place for the remote URL, for the system to know that there is a file belonging to an article and it's URL. It would maybe however be good to hear what COUNTER says about it. @ctgraham, what implication does it have for COUNTER, if we do not distinguish full text from supplementary/other material? I think it is currently not distinguished in the UsageStatistics plugin, but I will have to double check. Thanks!

crism commented 8 years ago

@asmecher, in our case, we’re publishing born-digital projects as monographs and journal articles. OMP or OJS act as the submission system, project tracker, and catalog/storefront, but the actual content is served out by Scalar or Omeka. We do consider it primary content. We’re not so much worried about COUNTER per se, but just the ability to aggregate statistics in one place. If necessary, we could pull in all the Scalar, Omeka, Pressbooks, CommentPress, Commons in a Box, OMP, and OJS stats and crunch them all together, but it would be a lot easier if the user portal tracked that stuff uniformly.

ctgraham commented 8 years ago

I don't have any problem with OJS counting clickthroughs to remote URLs internally; let's just make sure that these are not included in the COUNTER statistics (unless I hear otherwise from Lorainne at COUNTER). Merging COUNTER reports is not simple, but it's not new problem either... institutions may have coverage for some titles from multiple vendors. To track total usage for Title A, statistics may need to be consolidated from multiple JR1 reports. Internally, however, it is reasonable for OJS to record HTTP referrals to remote galleys. The question of whether to identify these as fulltext or supplemental then becomes an internal one.

ctgraham commented 8 years ago

Lorainne Estelle from Project COUNTER affirmed that it would be inappropriate for the software to count clickthroughs to remote files in the same way as we're counting downloads (that is, for the JR1 and AR1 reports). They are working on processes and policies for distributed usage logging which may help clarify the general question of consolidating usage tracking.

There is a COUNTER report (DB1), however, which does report on "Result Clicks" and "Record Views" for Databases and which would include both local and external resources. Capturing the clickthroughs of these remote files could help with constructing a DB1 report in the future.

bozana commented 8 years ago

The remote galleys cannot be then counted by OJS, because OJS now just has COUNTER based statistics -- the UsageStatistics plugin. The remote galleys access is/can be however logged by the server, in the server log files, so that journals could get statistics from them, in a different way, but not consistent with other OJS internal statistics :-( Any idea/solution about this?

ctgraham commented 8 years ago

We may need to clarify what the "ojs::counter" metric means. To this point, we have been using it to largely mean "these metrics were generated to be compatible with full-text downloads as reported in COUNTER JR1 and AR1". We could easily include counts of the remote galley redirects and still mean "these metrics were generated to be compatible with COUNTER reports", we would just need to be sure in generating the current reports that we we only include the fulltext downloads for JR1 and AR1. A future DB1 report could use the new entries.

Alternately, we could leave "ojs::counter" defined as-is, but create a new metric "ojs::counter::results" and "ojs::counter::views", representing (respectively) the click through from searching and browsing to a record result, and the viewing of the abstract of a record. In this paradigm, "ojs::counter" would better be named "ojs::counter::downloads", but I suspect changing that would be more effort than it is worth.

I think breaking out the new metric type(s) is the better solution.

NateWr commented 2 years ago

Closing this as outdated. If you feel this is still important, please consider making a proposal in the feature request category of our community forum.