pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
299 stars 444 forks source link

Integrate statistics Custom Report Generator with article statistics UI #7318

Closed NateWr closed 1 year ago

NateWr commented 3 years ago

Describe the problem you would like to solve The custom report generator (Statistics > Reports > Generate Custom Report) duplicates some of the filtering and sorting options in the article stats UI (Statistics > Articles). The custom report UI is confusing, and depending on which report template is used, offers advanced options that don't apply to the report.

Describe the solution you'd like Generating a custom report should be integrated with the article stats UI, so that when a journal manager is viewing the article statistics they generate a report from the date and section filters they have already selected.

The journal manager can click to generate a particular report, and then get a few additional options to configure the output they receive. Because the UI already includes tools to filter by date and daily/monthly, the options to configure a custom report would be much simpler. The JM would only need to select the columns they want to include.

reports

Who is asking for this feature? Tell us what kind of users are requesting this feature. Example: Journal Editors, Journal Administrators, Technical Support, Authors, Reviewers, etc.

Additional information An inventory exercise was conducted to understand all of the requirements for statistics reports. That can be found at: https://pkp.notion.site/d3078b32275d4b8a98fe65d5b77d125e?v=188ebb82537c4b8997ddb82f86193477

TO-DOs:

bozana commented 2 years ago

HI all,

To start a discussion or to provide some first input, what we could do in the custom submission stats report generator (the input for issue and context stats is then coming later):

I will orient my self on the old custom report generator to provide some information here, what would be possible to provide for the new custom report for submissions:

The user can choose if she/he would like to have the stats aggregated by month or day (only one of them, I would suggest), where month is the default option.

The user can select the start and end date for the report.

We could provide all or only some of the report templates:

a) article total views Would report the total counts (sum of all views for that submission_id) for each article. Columns: ID, Article, Section?, Issue?, Journal?, Month/Day, Count

b) article abstract views Would report abstract count for each article. Columns: ID, Article, Section?, Issue?, Journal?, Month/Day, Count

c) article total file downloads Would report all file downloads count (sum of pdf, html, other) for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count

d) article PDF downloads Would report all PDF file downloads for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count

e) article HTML downloads Would report all HTML file downloads for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count

f) article Other downloads Would report all Other file downloads for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count

g) article detailed file downloads Would report file downloads in detail, i.e. PDF, HTML, and Other views for each article. Columns: ID, Type (= Submission Files), File Type, File, Article, Section?, Issue?, Journal?, Month/Day, Count

h) article supp file views Would report all supp file views for each article. Columns: ID, Type (= Supp Files), Article, Section?, Issue?, Journal?, Month/Day, Count

i) article detailed views Would report everything in detail, i.e. abstract, PDF, HTML, Other, and supp file views for each article. Columns: ID, Type, File Type, File, Article, Section?, Issue?, Journal?, Month/Day, Count

Section?, Issue? and Journal? are the columns that we could eventually display in the report, for the better orientation.

Advanced options could be:

1) Select and GroupBy columns to choose (which are different than columns displayed in the report): submission_id, assoc_type, file_type, file_id, representation_id (month/day is always considered)

For the templates above, for example, those would be: a) submission_id b) submission_id, assoc_type (where assoc_type = 1048585) c) submission_id, assoc_type (where assoc_type = 515) d) submission_id, assoc_type, file_type (where assoc_type = 515 and file_type = 2) e) submission_id, assoc_type, file_type (where assoc_type = 515 and file_type = 1) f) submission_id, assoc_type, file_type (where assoc_type = 515 and file_type = 3) g) submission_id, assoc_type, file_type (where assoc_type = 515) h) submission_id, assoc_type (where assoc_type = 531) i) submission_id, assoc_type, file_type

Additional options (not used in the templates above) _fileid and _representationid would allow statistics aggregation for specific files or galleys. For example if the user would like to get report of the total counts for each file (no matter what file type it has) it could select: submission_id, file_id

These Select and GroupBy columns are different than those that we will display in the report. For display, we could always display: ID: object ID -- if it is a file, it would be the file_id; if it is abstract it would be the submission_id Article: article title this object belongs to eventually Section: section title this object belongs to eventually Issue: issue title this object belongs to eventually Journal: journal title this object belongs to Month/Day: the metrics are aggregated by Count: the number of the views

2) Filter options to choose/define:

assoc_type (abstract, submission file, supp file) file_type (PDF, HTML, DOC?, Other) submission_id section_id issue_id eventually also representation_id and file_id

For templates, see the filters in (where ...) in 1) Select and GroupBy a) - i) above.

3) OrderBy options -- Per default we would order by month/day:

I am not sure if we should provide these. Different ordering could be later done in the spreadsheet by the user? So I will leave this empty for now, and if necessary I can think later about it...

NateWr commented 2 years ago

Thanks @bozana, this looks great! I'd like to see us off-load as much of the data filtering to a spreadsheet tool. So we only need to provide the raw spreadsheets necessary for the user to get what they need with their own spreadsheet management.

In my view, all of the report templates can be compressed into one report, with one line per article:

ID Title Total Views Abstract Views File Downloads PDF Downloads HTML Downloads Other File Downloads ?Section ?Issue
1 Lorem ipsum... 100 50 50 25 15 10 Articles Vol. 1 No. 1

stats aggregated by month or day ... select the start and end date

Instead of aggregating stats we should rely on the start and end date. What I mean is that the report will always give the totals between the start and end date. If someone wants to get the article stats for each month, they can download a report for each month.

Select and GroupBy columns

Do we need these? I'm hopeful that we can get rid of these options and just let someone manipulate this in their spreadsheet tool.

Additional options (not used in the templates above) file_id and representation_id would allow statistics aggregation for specific files or galleys.

I think this should be separated from the article report. So someone can ask for a different report for statistics on files that would look like this:

File ID Name (File or Galley Name?) Downloads Submission
1 somefile.pdf 21 Lorem ipsum...

My thinking is that it would be a different report category. In the screenshot above, I had categories for Views, By Region, and COUNTER v5. So this would be like a Files report.

Filter options

Let's keep these simple and based on what's available in the UI: date, section, search phrase. In other words, we can take the filters that are already applied on the screen and generate a report from the submissions selected.

OrderBy options ... I am not sure if we should provide these. Different ordering could be later done in the spreadsheet by the user?

I agree. :+1:

bozana commented 2 years ago

Hi @NateWr, that all sounds good to me. It would allow me to use suggestion from @asmecher for SQLs without 'group by', I think, because everything is known/predictable, not so generic... In that case we would not need those extras (e.g. Select/GroupBy columns), just the elements in the UI from the screenshot above. It is slightly different from the current custom report generator... but if somebody would need anything different (e.g. the combination of article and file report per month/day) he/she could change the URL parameters of the PKP Usage Statistics Plugin to generate another kind of report...

bozana commented 2 years ago

Maybe to immediately ask also for the report By Region: Should this be: ID, Title, Country, Region, Total Investigations, Total Requests, Unique Investigations, Unique Requests? Shall the report be on the region level? What about only on the country level? Or what about city -- is this too detailed? Or can the user decide it?

NateWr commented 2 years ago

I think the user should be able to decide whether they want it at ONE of these levels: Country, Region or City. The way I see it working is that if the user chooses country, they get totals for the whole country:

ID Title Country Views Downloads Unique Views Unique Downloads
... ... Germany 10 10 10 10

If they ask for Region, they get totals for each region, but the country column still appears:

ID Title Country Region Views Downloads Unique Views Unique Downloads
... ... Germany Bavaria 5 5 5 5
... ... Germany Berlin 5 5 5 5

And if they ask for City, they get totals for each city, but the country and region columns still appear:

ID Title Country Region City Views Downloads Unique Views Unique Downloads
... ... Germany Bavaria Munich 3 3 3 3
... ... Germany Bavaria Nuremberg 2 2 2 2

Also, I think the regional stats are not related to COUNTER, right? If so, we don't need to use the terms "investigations" and "requests". Does Views/Downloads fit?

bozana commented 2 years ago

Hi @NateWr, I am not 100% sure about "Views" and "Downloads" -- theoretically yes, but: "Views" (investigations) would mean all possible views (abstract, files, supp files) and "Downloads" (requests) would mean only file views/downloads. Is this then clear enough?

NateWr commented 2 years ago

Ah, I see what you're saying. :thinking: It does make sense that a file view = a "download". Maybe Views/Downloads is the correct distinction. I guess somewhere we will need to explain all of these columns...

bozana commented 2 years ago

Hi @NateWr, may I ask here, in this issue: what would be the Geo stats endpoints? -- it is slightly different than other APIs, it contains submissions but also country, region and/or city. Now, for the article and file reports above I implemented stats/publications/articleReport and stats/publications/fileReport (with usual parameters for stats/publications/) to get the CSV reports. OK? But what/how to do it for the Geo reports? Should there be something like stats/geo/countryReport, stats/geo/regionReport and stats/geo/cityReport? Would such stats/geo/ also need some other methods for now (or we can leave it for later)? Something like getMany + parameter levelOfDetail = country (default), region, city, that returns list of submissions containing the total data (views, downloads, unique views and unique downloads) by that levelOfDetail? And the same for just one specific submission, e.g. stats/geo/1? Hmmm... :thinking:

NateWr commented 2 years ago

If possible, we should try to use the Accept header alongside existing API endpoints (see MDN.

The following request:

$.ajax({
  type: 'GET',
  url: 'http://example.org/api/v1/stats/publications',
  data: {
    dateStart: '...',
    dateEnd: '...'
  }
});

Would return the following response in JSON:

{
  "items": [...],
  "itemsMax: 30
}

Add the Accept header to the request:

$.ajax({
  type: 'GET',
  url: 'http://example.org/api/v1/stats/publications',
  headers: {
    'Accept': 'text/csv',
    'Content-Type': 'text/csv'
  },
  data: {
    dateStart: '...',
    dateEnd: '...'
  }
});

And the API will return the response in CSV:

ID,Title,Views,Etc
1,My Submission,123,...

With this approach, I think that we can use the following API endpoints:

/stats/publications
/stats/publications/files

The $slimRequest allows us to get the headers with $slimRequest->getHeaders(). See https://www.slimframework.com/docs/v3/objects/request.html

Do the geographical stats only apply to visits to publications? If just publications, we can use a query param to determine the appropriate scale for the report:

/stats/publications/locations?scale=country|region|city
NateWr commented 2 years ago

Now that I posted that, I realize that the API endpoint when delivering a report probably shouldn't include pagination. With the report they want the whole thing all at once. It may not make much sense for us to return CSV directly in our API.

I think this goes back to the thing we were discussing about how the report should be compiled through a task on the queue. We may need to rethink this part...

NateWr commented 2 years ago

So, I think either way we'll need a way to try not to hammer the server for very large exports. We have two options:

a) break the export into jobs on the task queue b) use the API to chunk the export and assemble it in the browser

I think (a) is the best approach, but it would require us to build a whole system for generating reports and downloading them later. I'm not sure if we want to do that just yet.

I think (b) is more workable than I expected. I found this answer on StackOverflow which suggests using Blob for large strings. I think you may already be using this approach.

So what we would do is we would use the API along with the Accept header as I described. The API would return CSV values with up to 100 rows at a time. So the JS code in the browser would check to see if there are more items and if so ask for page two of the results, and concatenate the CSV file itself, building the complete report in the browser.

With this approach we prevent a large export from killing the server in one go, and we can redirect the user directly to the file download. Does that sound like an ok approach? It may seem unusual to do this much work in the browser but I think it will be easier than it seems.

bozana commented 2 years ago

Thanks a lot @NateWr! I will definitely see/test the Accept header as you described. The performance is now much better, so we might be a little bit more flexible... -- e.g. for 180 submissions the report generation needs ca. 11 seconds... I trust you that we can then concatenate everything on the client side...

And regarding the Geo stats: Geo stats only apply to visits of publications/submissions. These stats are however different than stats/publications: another DB table is used (i.e. a different query builder) and we have total+unique views and downloads. Shall we not use another API handler? Or can another handler be associated with stats/publications/locations? -- I'll have to see... I am not sure if we would need just the plain numbers (without submissions) for a location -- e.g. just the totals of all submissions for a location -- somehow I don't think so... :thinking:

bozana commented 2 years ago

Ah, one more thing: The json result contains more information about a submission than csv result should -- csv should only contain the title. Depending on Accept header I can proceed differently in the code, I suppose. Also, for report we theoretically do not need to sort first by totals and we would not need the itemMax -- but if we combine the results in browser we would need them...

NateWr commented 2 years ago

or 180 submissions the report generation needs ca. 11 seconds

That's great, but things can change depending on server and database size, so I'd be careful not to make the max request too large. It shouldn't matter much if we do smaller chunks with each request. And it will be better for large servers.

And regarding the Geo stats: Shall we not use another API handler? Or can another handler be associated with stats/publications/locations? ... I am not sure if we would need just the plain numbers (without submissions) for a location -- e.g. just the totals of all submissions for a location -- somehow I don't think so...

Ahh, I see what you're saying. Let's see, the way the current publication API works is like this:

Endpoint Result
/stats/publications List of all publications with stats within filter range
/stats/publications/abstract Total hits to all abstracts within filter range broken down by month/day
/stats/publications/galley Same as above
/stats/publications/<publicationId> One publication with stats within filter range
/stats/publications/<publicationId>/abstract Hits to that publication's abstracts within filter range broken down by month/day

If we take this as a guide, we can maybe do the following for geo stats:

Endpoint Result
/stats/publications/locations List of all publications with stats broken down by geo range within filter range
/stats/publications/locations/countries List of all countries with total stats within filter range
/stats/publications/locations/<publicationId> One publication with stats broken down by geo range within filter range
/stats/publications/locations/<publicationId>/countries List of all countries with total stats for the specific publication

So in my view, for 3.4, we would only need to implement the CSV view of /stats/publications/locations. But this gives us a template for future improvements (for example, we might show /stats/publications/locations/countries in the UI some day).

The json result contains more information about a submission than csv result should -- csv should only contain the title. Depending on Accept header I can proceed differently in the code, I suppose.

Yeah, that's fine. Although if it is easy you can expand the CSV with some of that information.

Also, for report we theoretically do not need to sort first by totals

Ideally, both the JSON and the CSV response would use the same code to fetch the metrics data. So it shouldn't be any more work to support the same query params. The difference should be in how it is then compiled into a response.

we would not need the itemMax -- but if we combine the results in browser we would need them...

For the CSV response, you can include this information in a header, X-Total-Count: 135. See https://stackoverflow.com/a/43968710/1723499.

bozana commented 2 years ago

Thanks a lot @NateWr! I'll try... :-)

bozana commented 2 years ago

Hi @NateWr, I implemented the stats/publications/files like this: List of all submission files with stats within filter range (per default ordered DESC by total views, and count = 30 (as for publications)). Instead of file summary props I only display the fileId, fileName, downloads and submissionTitle -- the summary props seem to be too much, and the function getProperties currently needs request and submission object in the arguments.... This way the JSON contains the same data as CSV. If however summary props are wished in JSON, I can change or maybe implement it once submission files are implemented with the new EntityDAO... Should this function/endpoint consider only assoc_type = submission file or also supp file? -- We currently consider only submission file assoc_type everywhere i.e. also in other functions of the endpoint stats/publications/...

bozana commented 2 years ago

And maybe one more comment about stats/publications/locations: The JSON would look like this:

{
    "items":[
    {
        "subId":1,
        "publication":
        {
            "_href":"http:\/\/ojs-master.bb\/index.php\/publicknowledge\/api\/v1\/submissions\/1",
            "id":1,
            "urlPublished":"http:\/\/ojs-master.bb\/index.php\/publicknowledge\/article\/view\/mwandenga-signalling-theory",
            "urlWorkflow":"http:\/\/ojs-master.bb\/index.php\/publicknowledge\/workflow\/access\/1",
            "authorsStringShort":"Mwandenga et al.",
            "fullTitle":
            {
                "en_US":"The Signalling Theory Dividends: A Review Of The Literature And Empirical Evidence"
            }
        },
        "geoMetrics":
        {
            "Germany":
            {
                "Berlin":
                {
                    "Berlin":
                    {
                        "totalViews":"3",
                        "totalDownloads":"1",
                        "uniqueViews":"2",
                        "uniqueDownloads":"1"
                    }
                },
                               "Bavaria":
                {
                    "Munich":
                    {
                        "totalViews":"3",
                        "totalDownloads":"1",
                        "uniqueViews":"2",
                        "uniqueDownloads":"1"
                    }
                }
            }
        }
    }
    ],
    "itemsMax":1
}

That means I display the country, region and city as arrays i.e. a country would contain all existing regions, that would contain all existing cities. OK so? The other possibility would be to display it all in one element, e.g. like: geoMetrics: { country: Germany, region: Berlin, city: Berlin, "totalViews":"3", "totalDownloads":"1", "uniqueViews":"2", "uniqueDownloads":"1" }, { country: Germany, region: Bavaria, city: Munich, "totalViews":"3", "totalDownloads":"1", "uniqueViews":"2", "uniqueDownloads":"1" },

bozana commented 2 years ago

Maybe then to define also the CSV for issues and contexts: stats/issues: List all issues with stats within filter range (per default sorted by total views of TOC and issue galleys, and with count = 30). The CSV could then contain: ID, Issue identification, Total, Issue TOC views, Issue Galley Views.

stats/contexts: List contexts with stats within filter range (per default sorted by total views of context index page, and with count = 30). The CSV count contain: ID, Title, Total

OK so?

bozana commented 2 years ago

And maybe to be 100% sure: we implement the CSV response only for main getMany() function, right?

NateWr commented 2 years ago

stats/publications/locations

Thanks, @bozana. I can see now that the original table I provided is not ideal for this situtation. Generally, a REST API should try to use nouns that represent the object returned. So /locations should return a list of locations, not submissions. That's my mistake. I think the API design should support the following endpoints:

Country stats across all publications:

GET /stats/publications/countries
[
  {"country": "Germany", "total": 100, ...},
  {"country": "Canada", "total": 100, ...}
]

Region stats across all publications:

GET /stats/publications/regions
[
  {"region": "Berlin", "country": "Germany", "total": 100, ...},
  {"region": "Bavaria", "country": "Germany", "total": 100, ...},
  {"region": "Quebec", "country": "Canada", "total": 100, ...}
]

City stats across all publications:

GET /stats/publications/cities
[
  {"city": "Berlin", "region": "Berlin", "country": "Germany", "total": 100, ...},
  {"city": "Munich", "region": "Bavaria", "country": "Germany", "total": 100, ...},
  {"city": "Quebec City", "region": "Quebec", "country": "Canada", "total": 100, ...}
]

Then all three endpoints can exist for each publication:

GET /stats/publications/<publicationId>/countries
[
  {"country": "Germany", "total": 100, ...},
  {"country": "Canada", "total": 100, ...}
]
NateWr commented 2 years ago

stats/issues ... stats/contexts

Yeah, these look good. Maybe include URLs if it is easy?

And maybe to be 100% sure: we implement the CSV response only for main getMany() function, right?

:+1:

bozana commented 2 years ago

Hi @NateWr, I will try to summarize some other requirements, from plugins, on our stats services, that makes it easier for me to use the generic getMetrics function, and that we would eventually like to consider in our "Inventory":

COUNTER R4 plugin (current/existing plugin) requires:

Paperbuzz plugin requires:

And, of course, the PKP Usage Stats Plugin (that was also used by the custom report generator) that allows all the combinations...

NateWr commented 2 years ago

@bozana and I completed an exercise to try to understand all of the requirements for statistics reports. The results of that exercise can be seen here: https://pkp.notion.site/d3078b32275d4b8a98fe65d5b77d125e?v=188ebb82537c4b8997ddb82f86193477

bozana commented 2 years ago

LIST OF OBJECTS:

stats/publications -> getMany(): list of submisisons with their stats (abstract, galley, pdf, html, other, suppFile) -- shall we add the total here?

stats/publications/files -> getManyFiles(): list of files (full text + supp files) with their stats (downloads)

stats/publications/countries -> getManyCountries(): list of countries with their stats (total, unique) -- considers all submission views i.e. abstract, galley, supp file

stats/publications/regions -> getManyRegions(): list of regions with their stats (total, unique) -- considers all submission views i.e. abstract, galley, supp file

stats/publications/cities -> getManyCities(): list of cities with their stats (total, unique) -- considers all submission views i.e. abstract, galley, supp file

stats/contexts -> getMany(): list of contexts with their index (+catalog) page stats (total)

stats/issues -> getMany(): list of issues with their stats (total, toc, issueGalley)

ONE OBJECT:

stats/publications/ID -> get(): stats for the given submission (abstract, galley, pdf, html, other, suppFile)

stats/contexs/ID -> get(): index (+catalog) page stats for the given context (total)

stats/issues/ID -> get(): stats for the given issue (total, toc, issueGalley)

MONTHLY:

stats/publications/abstract -> getManyAbstract(): monthly total (context) abstract numbers (date, label, value)

stats/publicatisons/galley -> getManyGalley(): monthly total (context) galley (full text) numbers (date, label, value)

stats/publications/ID/abstract -> getAbstract(): monthly total (given submisison) abstract numbers (date, label, value)

stats/publications/ID/galley -> getGalley(): monthly total (given submisison) galley (full text) numbers (date, label, value)

stats/contexts/timeline -> getManyTimeline(): monthly total (for all contexts) index (+catalog) page numbers (date, label, value)

stats/contexts/ID/timeline -> getTimeline(): monthly total (given context) index (+catalog) page numbers (date, label, value)

stats/issues/toc -> getManyToc(): monthly total (context) toc numbers (date, label, value)

stats/issues/galley -> getManyGalley(): monthly total (context) issue galley number (date, label, value)

stats/issues/ID/toc -> getToc(): monthly total (given issue) toc numbers (date, label, value)

stats/issues/ID/galley -> getGalley(): monthly total (given issue) galley numbers (date, label, value)

NateWr commented 2 years ago

:pray: thank you @bozana! This is soooo helpful. In 2 minutes I was able to get a complete overview and identify what was confusing me.

I think the problem is with the monthly statistics. We are running into naming clashes with the endpoints. What about putting all of them behind a /timeline endpoint? So it would be like:

stats/publications/timeline/abstract
stats/publications/timeline/files

That would solve our naming clash with galley and files, so we can always use files.

Also, I think we can simplify further by giving each object a default timeline. So the following:

stats/publications/timeline

Would provide the abstract timeline. Then other API endpoints could use query arguments:

stats/publications/timeline
stats/publications/timeline?type=files

So I'd see the following endpoints for monthly stats:

Endpoint Function
stats/publications/timeline PublicationStats::getManyTimeline
stats/publications/timeline?type=files PublicationStats::getManyFilesTimeline
stats/publications/ID/timeline PublicationStats::getTimeline
stats/publications/ID/timeline?type=files PublicationStats::getFilesTimeline
stats/contexts/timeline ContextStats::getManyTimeline
stats/contexts/ID/timeline ContextStats::getTimeline
stats/issues/timeline (toc) IssueStats::getManyTimeline
stats/issues/timeline?type=files IssueStats::getManyFilesTimeline
stats/issues/ID/timeline (toc) IssueStats::getTimeline
stats/issues/ID/timeline?type=files IssueStats::getFilesTimeline

Does that sound good?

bozana commented 2 years ago

Hi @NateWr, yes that sounds good. I would have one question: shall we somehow separate full texts and supp files in those timeline calls for publications? -- now we consider the supp files (in list of objects and in one object stats), and I think the user would maybe like to have monthly stats (only) for full texts rather than supp files... Or shall timeline?type=files only mean full text files?

NateWr commented 2 years ago

I don't think we should make the distinction. The timeline should show all files.

If we later want to extend the endpoint to consider only some files, we could for example use /timeline?type=primaryFiles or something like that. But I don't think we need to do that yet.

bozana commented 1 year ago

PRs for the export possibility of articles/monographs/preprints: pkp-lib: https://github.com/pkp/pkp-lib/pull/8308 ui-library: https://github.com/pkp/ui-library/pull/216 ojs: https://github.com/pkp/ojs/pull/3562 omp: https://github.com/pkp/omp/pull/1215 ops: https://github.com/pkp/ops/pull/365

bozana commented 1 year ago

@NateWr, could you please take a look at the PR above? It works, but eventually sorry for my clumsy solution of this vue/UI work... I am happy to improve it according to your comments! :-)

NateWr commented 1 year ago

Is there a PR for UI Library that should be included? I'm getting JS errors related to missing methods and properties like downloadReport that make me think there are changes to StatsPublicationPage.vue I need to look at.

Also, do you have a (not too large) database dump you can send me with stats in it? I don't have stats in any of my local test instances.

bozana commented 1 year ago

Oh, sorry @NateWr :-( I forgot it :-P Now it is added in the PRs list above... (will update the submodule in OJS in sec).Thanks a lot!

NateWr commented 1 year ago

Yep, working for me now. :) I left a couple of comments on the PRs. I also made a couple of commits with a rough idea of how to change the stats download modal. There's not quite enough information to know what I'm downloading, so I moved the report type selection into the modal. If you like it, you can use the commits here to work it into your setup (I didn't translate any of the text so there's still work to do):

In addition to that, a few other comments:

  1. The existing custom report generator allows us to get article stats broken down by day or month. Did we decide to leave that out for now?
  2. In the file CSV report, let's put the article details first, so the columns are article id, article title, file id, file name, file views.
  3. We should try to make the report download filenames more useful, because people might download a bunch of these. Ideally it would be something like stats-<current-date-and-time>-<context-acronym>-articles-<date-range>-<section>-<section>.csv.
  4. When I clicked on the Abstracts, Files, Daily or Monthly buttons on the chart I got the error "The requested URL was not recognized."
bozana commented 1 year ago

Thanks a lot @NateWr! Regarding monthly/daily reports: We decided that for now the user needs to select the month he/she would like to have the report for. I do not know any more what we said for the daily reports... :thinking: And for the other comments: I will consider them now...

bozana commented 1 year ago

Hi @NateWr, I think I considered all your comments. Could you please take another look? A few comments: Next the filter Issues will come, that's why I treat the filters that (generic) way... I use 'submissions' instead of 'articles', 'monographs' and 'preprints' for the file name -- in order not to translate it -- but, if wishes, I can change this...

bozana commented 1 year ago

The current stats API does not support daily stats for each article, only for the timeline i.e. for all articles...

NateWr commented 1 year ago

The current stats API does not support daily stats for each article, only for the timeline i.e. for all articles...

I think the existing API supported this. But also, I think the general stats API /stats/publications/abstract let's you filter by searchPhrase, right? This means that someone could sort of achieve this by entering the exact title as the search?

bozana commented 1 year ago

I haven't changed that part of the API, so yes, this is still possible. This is for one article. Yes, searchPhrase is possible for /stats/publications/timeline. But there is no possibility to have every article and its stats listed by day or month. Month we said could be solved by choosing month by month date range.

NateWr commented 1 year ago

Ok, that should be good enough, as long as someone can retrieve a history of views/downloads for an article or a group of articles.

NateWr commented 1 year ago

Looks good, @bozana. Just a few comments in the code. In addition, I had these comments:

  1. In PKPStatsPublicationHandler::_getFileReportColumnNames(), two of the column names are reversed:

ID,Title,"File Views","Article ID","Article Title"

Let's call it "Article Title" first and then at the end call it "Filename".

  1. The search phrase doesn't seem to be working. When I search for a word in a submission title or its id, no results are returned.

  2. Is there some way to download a timeline in CSV? Maybe we can split this to a separate issue, but I think we'll want that too, just to complete the replacement of the custom report generator. It could be another between articles and files that says:

Timeline The number of [article views|file downloads] for each [day|month] in this date range.

  1. I think the addition of Supplementary File Views is confusing the table. The totals no longer add up correctly, and it's not clear why supplementary file views are appearing when PDF, HTML and Other views are not (see screenshot). Can we remove this column from the table and CSV reports until we can figure out a better way to display this information? We probably need to think a bit about how we add more information regarding the data into the UI. But we can probably do that during the release candidate phase or even after 3.4 is released.

supplementary-file-views

bozana commented 1 year ago

Hi @NateWr, I think I implemented all your comments. Regarding the search: the search is the general submission entity search and it works, it just does not work 'correctly' on my stats data -- in metrics tables I have submission IDs that are not published and search only considers the published submissions. In real data this is not the case -- the stats data exist only for published submissions. For the CSV export of the timeline I created a new issue: https://github.com/pkp/pkp-lib/issues/8328.

NateWr commented 1 year ago

Great! I just checked the stats search and you're right: it worked for the one submission I had that was published. I'm happy for you to merge whenever you're ready. :+1:

bozana commented 1 year ago

PRs for Statistics > Issues page: ui-library: https://github.com/pkp/ui-library/pull/217 ojs: https://github.com/pkp/ojs/pull/3571 pkp-lib: https://github.com/pkp/pkp-lib/pull/8376 https://github.com/pkp/pkp-lib/pull/8353 (old PR with the accidentally wrong branch name, used for the code review)

bozana commented 1 year ago

@NateWr, could you please take a look at these new PRs for Statistics > Issues page?

bozana commented 1 year ago

@NateWr, I have considered all your comments. Would you like to take another look? Thanks a lot!!! :pray:

EDIT: I have also added a commit in the pkp-lib that adds CC Attribution to DB-IP.com for Geo data. I have asked @asmecher and he confirmed that that would be a good place...

bozana commented 1 year ago

PRs for issues filter on the stats article page + issueIds param for stats API: pkp-lib: https://github.com/pkp/pkp-lib/pull/8358 ui-library: https://github.com/pkp/ui-library/pull/222 ojs: https://github.com/pkp/ojs/pull/3582 omp: https://github.com/pkp/omp/pull/1222 ops: https://github.com/pkp/ops/pull/374

bozana commented 1 year ago

@NateWr, here are the PRs for issues filtering on the stat article page. Also the additional stats API param issueIds. Could you please review it?

bozana commented 1 year ago

Hi @NateWr, I have considered your comments for issues filter PRs (https://github.com/pkp/pkp-lib/issues/7318#issuecomment-1287676815). Would you like to double check?

bozana commented 1 year ago

Maybe also here a reminder that I have also considered your comments on these PRs https://github.com/pkp/pkp-lib/issues/7318#issuecomment-1275103834 -- if you would like to double check...

bozana commented 1 year ago

@NateWr, I considered the second review for these PRs https://github.com/pkp/pkp-lib/issues/7318#issuecomment-1275103834. Could you please take a look? Thanks a lot!!!

bozana commented 1 year ago

Sorry @NateWr, me again: I added the tooltip for Geolocation, changed the download file name and added the parameters into the CSV file. Could you take a look? :pray: