pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
299 stars 444 forks source link

COUNTER Release 5 #6781

Closed bozana closed 2 years ago

bozana commented 3 years ago

Implement the COUNTER Release 5 for OJS/OMP/OPS usage statistics. Here we can collect everything we decide is necessary. We can have a discussion below and every time we decide something we can summarize it here.

It seems the Release 5 with lots of changes is out there. Here a guide for journals: https://www.projectcounter.org/wp-content/uploads/2020/08/Module_2_Journal_Usage_20200811.pdf.

  1. Processing rules for COUNTER 5 reports, s. https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/: a) Double click filtering (s. section 7.2): This is implemented here: https://github.com/pkp/usageStats/blob/master/UsageStatsLoader.inc.php#L194-L224. Till know the differentiation was between the access of HTML, PDF and other. This seems not to be needed any more -- We can change it to consider 30 seconds for any link i.e. file. Also we should change our implementation so that only the same URLs are considered (and not the assocType + assocID as till now). The uniqueness is treated differently: b) Unique Items (s. section 7.3): In our case Item is an article. The matching report is AR1. And the rule is: "If multiple transactions qualifying for the Metric_Type in question represent the same item and occur in the same user-sessions, only one unique activity MUST be counted for that item." Where user-session seems to be defined for an hour, as far as I understand it. The question if the article versions do belong to the same Item is still open. Due to the way we represent them internally I would say they do belong to the same Item. c) Unique Titles (s. section 7.4): In the case of a journal Title = a journal and the report = Title Master Report. Similar to the rule for the unique item above, the rule here is: "If multiple transactions qualifying for the Metric_Type in question represent the same title and occur in the same user-session only one unique activity MUST be counted for that title.". Where the user-session seems to be defined for an hour. I.e. here, if a user accesses one article and then another in the same session, it would only count once. This rule i.e. report seems not to be used for single journals -- introduced mostly for books. Do we need it (e.g. for libraries and multi-journal installations)? d) Internet Robots and Crawlers (s. section 7.8): Same as for Release 4. COUNTER maintains the current list of internet robots and crawlers at https://github.com/atmire/COUNTER-Robots. We use it as module in lib/pkp/lib/counterBots, assign the file to the variable COUNTER_USER_AGENTS_FILE (https://github.com/pkp/pkp-lib/blob/master/classes/core/Core.inc.php#L23) and implement the function isUserAgentBot in https://github.com/pkp/pkp-lib/blob/master/classes/core/Core.inc.php#L100. The function is then used when the log files are processed (https://github.com/pkp/usageStats/blob/master/UsageStatsLoader.inc.php#L170). We should define the strategy when we get the most recent version of the list.

  2. Because R5 now supports/count abstract views (in total views count), shell we consider the galley view pages too?

  3. SUSHI support is mandatory for compliance with COUNTER Release 5 (s. https://www.projectcounter.org/wp-content/uploads/2019/05/Release_5_TechNotes_PDFX_20190509-Revised.pdf).

  4. What Reports we would need/like to support/provide: AR1, Journal Master Report, X?

bozana commented 1 year ago

PR that considers the first date published of a context when calculating the SUSHI start date: pkp-lib: https://github.com/pkp/pkp-lib/pull/8390 ojs: https://github.com/pkp/ojs/pull/3605 (only submodule update) omp: https://github.com/pkp/omp/pull/1240 (only submodule update) ops: https://github.com/pkp/ops/pull/386 (only submodule update)