Closed NateWr closed 1 year ago
HI all,
To start a discussion or to provide some first input, what we could do in the custom submission stats report generator (the input for issue and context stats is then coming later):
I will orient my self on the old custom report generator to provide some information here, what would be possible to provide for the new custom report for submissions:
The user can choose if she/he would like to have the stats aggregated by month or day (only one of them, I would suggest), where month is the default option.
The user can select the start and end date for the report.
We could provide all or only some of the report templates:
a) article total views Would report the total counts (sum of all views for that submission_id) for each article. Columns: ID, Article, Section?, Issue?, Journal?, Month/Day, Count
b) article abstract views Would report abstract count for each article. Columns: ID, Article, Section?, Issue?, Journal?, Month/Day, Count
c) article total file downloads Would report all file downloads count (sum of pdf, html, other) for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count
d) article PDF downloads Would report all PDF file downloads for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count
e) article HTML downloads Would report all HTML file downloads for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count
f) article Other downloads Would report all Other file downloads for each article. Columns: ID, Type (= Submission Files), Article, Section?, Issue?, Journal?, Month/Day, Count
g) article detailed file downloads Would report file downloads in detail, i.e. PDF, HTML, and Other views for each article. Columns: ID, Type (= Submission Files), File Type, File, Article, Section?, Issue?, Journal?, Month/Day, Count
h) article supp file views Would report all supp file views for each article. Columns: ID, Type (= Supp Files), Article, Section?, Issue?, Journal?, Month/Day, Count
i) article detailed views Would report everything in detail, i.e. abstract, PDF, HTML, Other, and supp file views for each article. Columns: ID, Type, File Type, File, Article, Section?, Issue?, Journal?, Month/Day, Count
Section?, Issue? and Journal? are the columns that we could eventually display in the report, for the better orientation.
Advanced options could be:
1) Select and GroupBy columns to choose (which are different than columns displayed in the report): submission_id, assoc_type, file_type, file_id, representation_id (month/day is always considered)
For the templates above, for example, those would be: a) submission_id b) submission_id, assoc_type (where assoc_type = 1048585) c) submission_id, assoc_type (where assoc_type = 515) d) submission_id, assoc_type, file_type (where assoc_type = 515 and file_type = 2) e) submission_id, assoc_type, file_type (where assoc_type = 515 and file_type = 1) f) submission_id, assoc_type, file_type (where assoc_type = 515 and file_type = 3) g) submission_id, assoc_type, file_type (where assoc_type = 515) h) submission_id, assoc_type (where assoc_type = 531) i) submission_id, assoc_type, file_type
Additional options (not used in the templates above) _fileid and _representationid would allow statistics aggregation for specific files or galleys. For example if the user would like to get report of the total counts for each file (no matter what file type it has) it could select: submission_id, file_id
These Select and GroupBy columns are different than those that we will display in the report. For display, we could always display: ID: object ID -- if it is a file, it would be the file_id; if it is abstract it would be the submission_id Article: article title this object belongs to eventually Section: section title this object belongs to eventually Issue: issue title this object belongs to eventually Journal: journal title this object belongs to Month/Day: the metrics are aggregated by Count: the number of the views
2) Filter options to choose/define:
assoc_type (abstract, submission file, supp file) file_type (PDF, HTML, DOC?, Other) submission_id section_id issue_id eventually also representation_id and file_id
For templates, see the filters in (where ...
) in 1) Select and GroupBy a) - i) above.
3) OrderBy options -- Per default we would order by month/day:
I am not sure if we should provide these. Different ordering could be later done in the spreadsheet by the user? So I will leave this empty for now, and if necessary I can think later about it...
Thanks @bozana, this looks great! I'd like to see us off-load as much of the data filtering to a spreadsheet tool. So we only need to provide the raw spreadsheets necessary for the user to get what they need with their own spreadsheet management.
In my view, all of the report templates can be compressed into one report, with one line per article:
ID | Title | Total Views | Abstract Views | File Downloads | PDF Downloads | HTML Downloads | Other File Downloads | ?Section | ?Issue |
---|---|---|---|---|---|---|---|---|---|
1 | Lorem ipsum... | 100 | 50 | 50 | 25 | 15 | 10 | Articles | Vol. 1 No. 1 |
stats aggregated by month or day ... select the start and end date
Instead of aggregating stats we should rely on the start and end date. What I mean is that the report will always give the totals between the start and end date. If someone wants to get the article stats for each month, they can download a report for each month.
Select and GroupBy columns
Do we need these? I'm hopeful that we can get rid of these options and just let someone manipulate this in their spreadsheet tool.
Additional options (not used in the templates above) file_id and representation_id would allow statistics aggregation for specific files or galleys.
I think this should be separated from the article report. So someone can ask for a different report for statistics on files that would look like this:
File ID | Name (File or Galley Name?) | Downloads | Submission |
---|---|---|---|
1 | somefile.pdf | 21 | Lorem ipsum... |
My thinking is that it would be a different report category. In the screenshot above, I had categories for Views, By Region, and COUNTER v5. So this would be like a Files report.
Filter options
Let's keep these simple and based on what's available in the UI: date, section, search phrase. In other words, we can take the filters that are already applied on the screen and generate a report from the submissions selected.
OrderBy options ... I am not sure if we should provide these. Different ordering could be later done in the spreadsheet by the user?
I agree. :+1:
Hi @NateWr, that all sounds good to me. It would allow me to use suggestion from @asmecher for SQLs without 'group by', I think, because everything is known/predictable, not so generic... In that case we would not need those extras (e.g. Select/GroupBy columns), just the elements in the UI from the screenshot above. It is slightly different from the current custom report generator... but if somebody would need anything different (e.g. the combination of article and file report per month/day) he/she could change the URL parameters of the PKP Usage Statistics Plugin to generate another kind of report...
Maybe to immediately ask also for the report By Region: Should this be: ID, Title, Country, Region, Total Investigations, Total Requests, Unique Investigations, Unique Requests? Shall the report be on the region level? What about only on the country level? Or what about city -- is this too detailed? Or can the user decide it?
I think the user should be able to decide whether they want it at ONE of these levels: Country, Region or City. The way I see it working is that if the user chooses country, they get totals for the whole country:
ID | Title | Country | Views | Downloads | Unique Views | Unique Downloads |
---|---|---|---|---|---|---|
... | ... | Germany | 10 | 10 | 10 | 10 |
If they ask for Region, they get totals for each region, but the country column still appears:
ID | Title | Country | Region | Views | Downloads | Unique Views | Unique Downloads |
---|---|---|---|---|---|---|---|
... | ... | Germany | Bavaria | 5 | 5 | 5 | 5 |
... | ... | Germany | Berlin | 5 | 5 | 5 | 5 |
And if they ask for City, they get totals for each city, but the country and region columns still appear:
ID | Title | Country | Region | City | Views | Downloads | Unique Views | Unique Downloads |
---|---|---|---|---|---|---|---|---|
... | ... | Germany | Bavaria | Munich | 3 | 3 | 3 | 3 |
... | ... | Germany | Bavaria | Nuremberg | 2 | 2 | 2 | 2 |
Also, I think the regional stats are not related to COUNTER, right? If so, we don't need to use the terms "investigations" and "requests". Does Views/Downloads fit?
Hi @NateWr, I am not 100% sure about "Views" and "Downloads" -- theoretically yes, but: "Views" (investigations) would mean all possible views (abstract, files, supp files) and "Downloads" (requests) would mean only file views/downloads. Is this then clear enough?
Ah, I see what you're saying. :thinking: It does make sense that a file view = a "download". Maybe Views/Downloads is the correct distinction. I guess somewhere we will need to explain all of these columns...
Hi @NateWr, may I ask here, in this issue: what would be the Geo stats endpoints? -- it is slightly different than other APIs, it contains submissions but also country, region and/or city.
Now, for the article and file reports above I implemented stats/publications/articleReport and stats/publications/fileReport (with usual parameters for stats/publications/) to get the CSV reports. OK?
But what/how to do it for the Geo reports? Should there be something like stats/geo/countryReport, stats/geo/regionReport and stats/geo/cityReport?
Would such stats/geo/
also need some other methods for now (or we can leave it for later)? Something like getMany + parameter levelOfDetail = country (default), region, city, that returns list of submissions containing the total data (views, downloads, unique views and unique downloads) by that levelOfDetail? And the same for just one specific submission, e.g. stats/geo/1? Hmmm... :thinking:
If possible, we should try to use the Accept
header alongside existing API endpoints (see MDN.
The following request:
$.ajax({
type: 'GET',
url: 'http://example.org/api/v1/stats/publications',
data: {
dateStart: '...',
dateEnd: '...'
}
});
Would return the following response in JSON:
{
"items": [...],
"itemsMax: 30
}
Add the Accept
header to the request:
$.ajax({
type: 'GET',
url: 'http://example.org/api/v1/stats/publications',
headers: {
'Accept': 'text/csv',
'Content-Type': 'text/csv'
},
data: {
dateStart: '...',
dateEnd: '...'
}
});
And the API will return the response in CSV:
ID,Title,Views,Etc
1,My Submission,123,...
With this approach, I think that we can use the following API endpoints:
/stats/publications
/stats/publications/files
The $slimRequest
allows us to get the headers with $slimRequest->getHeaders()
. See https://www.slimframework.com/docs/v3/objects/request.html
Do the geographical stats only apply to visits to publications? If just publications, we can use a query param to determine the appropriate scale for the report:
/stats/publications/locations?scale=country|region|city
Now that I posted that, I realize that the API endpoint when delivering a report probably shouldn't include pagination. With the report they want the whole thing all at once. It may not make much sense for us to return CSV directly in our API.
I think this goes back to the thing we were discussing about how the report should be compiled through a task on the queue. We may need to rethink this part...
So, I think either way we'll need a way to try not to hammer the server for very large exports. We have two options:
a) break the export into jobs on the task queue b) use the API to chunk the export and assemble it in the browser
I think (a) is the best approach, but it would require us to build a whole system for generating reports and downloading them later. I'm not sure if we want to do that just yet.
I think (b) is more workable than I expected. I found this answer on StackOverflow which suggests using Blob
for large strings. I think you may already be using this approach.
So what we would do is we would use the API along with the Accept
header as I described. The API would return CSV values with up to 100 rows at a time. So the JS code in the browser would check to see if there are more items and if so ask for page two of the results, and concatenate the CSV file itself, building the complete report in the browser.
With this approach we prevent a large export from killing the server in one go, and we can redirect the user directly to the file download. Does that sound like an ok approach? It may seem unusual to do this much work in the browser but I think it will be easier than it seems.
Thanks a lot @NateWr!
I will definitely see/test the Accept
header as you described.
The performance is now much better, so we might be a little bit more flexible... -- e.g. for 180 submissions the report generation needs ca. 11 seconds... I trust you that we can then concatenate everything on the client side...
And regarding the Geo stats: Geo stats only apply to visits of publications/submissions.
These stats are however different than stats/publications: another DB table is used (i.e. a different query builder) and we have total+unique views and downloads. Shall we not use another API handler? Or can another handler be associated with stats/publications/locations
? -- I'll have to see...
I am not sure if we would need just the plain numbers (without submissions) for a location -- e.g. just the totals of all submissions for a location -- somehow I don't think so... :thinking:
Ah, one more thing: The json result contains more information about a submission than csv result should -- csv should only contain the title. Depending on Accept
header I can proceed differently in the code, I suppose.
Also, for report we theoretically do not need to sort first by totals and we would not need the itemMax -- but if we combine the results in browser we would need them...
or 180 submissions the report generation needs ca. 11 seconds
That's great, but things can change depending on server and database size, so I'd be careful not to make the max request too large. It shouldn't matter much if we do smaller chunks with each request. And it will be better for large servers.
And regarding the Geo stats: Shall we not use another API handler? Or can another handler be associated with stats/publications/locations? ... I am not sure if we would need just the plain numbers (without submissions) for a location -- e.g. just the totals of all submissions for a location -- somehow I don't think so...
Ahh, I see what you're saying. Let's see, the way the current publication API works is like this:
Endpoint | Result |
---|---|
/stats/publications |
List of all publications with stats within filter range |
/stats/publications/abstract |
Total hits to all abstracts within filter range broken down by month/day |
/stats/publications/galley |
Same as above |
/stats/publications/<publicationId> |
One publication with stats within filter range |
/stats/publications/<publicationId>/abstract |
Hits to that publication's abstracts within filter range broken down by month/day |
If we take this as a guide, we can maybe do the following for geo stats:
Endpoint | Result |
---|---|
/stats/publications/locations |
List of all publications with stats broken down by geo range within filter range |
/stats/publications/locations/countries |
List of all countries with total stats within filter range |
/stats/publications/locations/<publicationId> |
One publication with stats broken down by geo range within filter range |
/stats/publications/locations/<publicationId>/countries |
List of all countries with total stats for the specific publication |
So in my view, for 3.4, we would only need to implement the CSV view of /stats/publications/locations
. But this gives us a template for future improvements (for example, we might show /stats/publications/locations/countries
in the UI some day).
The json result contains more information about a submission than csv result should -- csv should only contain the title. Depending on Accept header I can proceed differently in the code, I suppose.
Yeah, that's fine. Although if it is easy you can expand the CSV with some of that information.
Also, for report we theoretically do not need to sort first by totals
Ideally, both the JSON and the CSV response would use the same code to fetch the metrics data. So it shouldn't be any more work to support the same query params. The difference should be in how it is then compiled into a response.
we would not need the itemMax -- but if we combine the results in browser we would need them...
For the CSV response, you can include this information in a header, X-Total-Count: 135
. See https://stackoverflow.com/a/43968710/1723499.
Thanks a lot @NateWr! I'll try... :-)
Hi @NateWr,
I implemented the stats/publications/files
like this:
List of all submission files with stats within filter range (per default ordered DESC by total views, and count = 30 (as for publications)).
Instead of file summary props I only display the fileId, fileName, downloads and submissionTitle
-- the summary props seem to be too much, and the function getProperties currently needs request and submission object in the arguments....
This way the JSON contains the same data as CSV.
If however summary props are wished in JSON, I can change or maybe implement it once submission files are implemented with the new EntityDAO...
Should this function/endpoint consider only assoc_type = submission file or also supp file? -- We currently consider only submission file assoc_type everywhere i.e. also in other functions of the endpoint stats/publications/...
And maybe one more comment about stats/publications/locations
:
The JSON would look like this:
{ "items":[ { "subId":1, "publication": { "_href":"http:\/\/ojs-master.bb\/index.php\/publicknowledge\/api\/v1\/submissions\/1", "id":1, "urlPublished":"http:\/\/ojs-master.bb\/index.php\/publicknowledge\/article\/view\/mwandenga-signalling-theory", "urlWorkflow":"http:\/\/ojs-master.bb\/index.php\/publicknowledge\/workflow\/access\/1", "authorsStringShort":"Mwandenga et al.", "fullTitle": { "en_US":"The Signalling Theory Dividends: A Review Of The Literature And Empirical Evidence" } }, "geoMetrics": { "Germany": { "Berlin": { "Berlin": { "totalViews":"3", "totalDownloads":"1", "uniqueViews":"2", "uniqueDownloads":"1" } }, "Bavaria": { "Munich": { "totalViews":"3", "totalDownloads":"1", "uniqueViews":"2", "uniqueDownloads":"1" } } } } } ], "itemsMax":1 }
That means I display the country, region and city as arrays i.e. a country would contain all existing regions, that would contain all existing cities. OK so? The other possibility would be to display it all in one element, e.g. like: geoMetrics: { country: Germany, region: Berlin, city: Berlin, "totalViews":"3", "totalDownloads":"1", "uniqueViews":"2", "uniqueDownloads":"1" }, { country: Germany, region: Bavaria, city: Munich, "totalViews":"3", "totalDownloads":"1", "uniqueViews":"2", "uniqueDownloads":"1" },
Maybe then to define also the CSV for issues and contexts:
stats/issues
: List all issues with stats within filter range (per default sorted by total views of TOC and issue galleys, and with count = 30).
The CSV could then contain:
ID, Issue identification, Total, Issue TOC views, Issue Galley Views
.
stats/contexts
: List contexts with stats within filter range (per default sorted by total views of context index page, and with count = 30).
The CSV count contain:
ID, Title, Total
OK so?
And maybe to be 100% sure: we implement the CSV response only for main getMany() function, right?
stats/publications/locations
Thanks, @bozana. I can see now that the original table I provided is not ideal for this situtation. Generally, a REST API should try to use nouns that represent the object returned. So /locations
should return a list of locations, not submissions. That's my mistake. I think the API design should support the following endpoints:
Country stats across all publications:
GET /stats/publications/countries
[
{"country": "Germany", "total": 100, ...},
{"country": "Canada", "total": 100, ...}
]
Region stats across all publications:
GET /stats/publications/regions
[
{"region": "Berlin", "country": "Germany", "total": 100, ...},
{"region": "Bavaria", "country": "Germany", "total": 100, ...},
{"region": "Quebec", "country": "Canada", "total": 100, ...}
]
City stats across all publications:
GET /stats/publications/cities
[
{"city": "Berlin", "region": "Berlin", "country": "Germany", "total": 100, ...},
{"city": "Munich", "region": "Bavaria", "country": "Germany", "total": 100, ...},
{"city": "Quebec City", "region": "Quebec", "country": "Canada", "total": 100, ...}
]
Then all three endpoints can exist for each publication:
GET /stats/publications/<publicationId>/countries
[
{"country": "Germany", "total": 100, ...},
{"country": "Canada", "total": 100, ...}
]
stats/issues ... stats/contexts
Yeah, these look good. Maybe include URLs if it is easy?
And maybe to be 100% sure: we implement the CSV response only for main getMany() function, right?
:+1:
Hi @NateWr, I will try to summarize some other requirements, from plugins, on our stats services, that makes it easier for me to use the generic getMetrics function, and that we would eventually like to consider in our "Inventory":
COUNTER R4 plugin (current/existing plugin) requires:
Paperbuzz plugin requires:
And, of course, the PKP Usage Stats Plugin (that was also used by the custom report generator) that allows all the combinations...
@bozana and I completed an exercise to try to understand all of the requirements for statistics reports. The results of that exercise can be seen here: https://pkp.notion.site/d3078b32275d4b8a98fe65d5b77d125e?v=188ebb82537c4b8997ddb82f86193477
LIST OF OBJECTS:
stats/publications -> getMany(): list of submisisons with their stats (abstract, galley, pdf, html, other, suppFile) -- shall we add the total here?
stats/publications/files -> getManyFiles(): list of files (full text + supp files) with their stats (downloads)
stats/publications/countries -> getManyCountries(): list of countries with their stats (total, unique) -- considers all submission views i.e. abstract, galley, supp file
stats/publications/regions -> getManyRegions(): list of regions with their stats (total, unique) -- considers all submission views i.e. abstract, galley, supp file
stats/publications/cities -> getManyCities(): list of cities with their stats (total, unique) -- considers all submission views i.e. abstract, galley, supp file
stats/contexts -> getMany(): list of contexts with their index (+catalog) page stats (total)
stats/issues -> getMany(): list of issues with their stats (total, toc, issueGalley)
ONE OBJECT:
stats/publications/ID -> get(): stats for the given submission (abstract, galley, pdf, html, other, suppFile)
stats/contexs/ID -> get(): index (+catalog) page stats for the given context (total)
stats/issues/ID -> get(): stats for the given issue (total, toc, issueGalley)
MONTHLY:
stats/publications/abstract -> getManyAbstract(): monthly total (context) abstract numbers (date, label, value)
stats/publicatisons/galley -> getManyGalley(): monthly total (context) galley (full text) numbers (date, label, value)
stats/publications/ID/abstract -> getAbstract(): monthly total (given submisison) abstract numbers (date, label, value)
stats/publications/ID/galley -> getGalley(): monthly total (given submisison) galley (full text) numbers (date, label, value)
stats/contexts/timeline -> getManyTimeline(): monthly total (for all contexts) index (+catalog) page numbers (date, label, value)
stats/contexts/ID/timeline -> getTimeline(): monthly total (given context) index (+catalog) page numbers (date, label, value)
stats/issues/toc -> getManyToc(): monthly total (context) toc numbers (date, label, value)
stats/issues/galley -> getManyGalley(): monthly total (context) issue galley number (date, label, value)
stats/issues/ID/toc -> getToc(): monthly total (given issue) toc numbers (date, label, value)
stats/issues/ID/galley -> getGalley(): monthly total (given issue) galley numbers (date, label, value)
:pray: thank you @bozana! This is soooo helpful. In 2 minutes I was able to get a complete overview and identify what was confusing me.
I think the problem is with the monthly statistics. We are running into naming clashes with the endpoints. What about putting all of them behind a /timeline
endpoint? So it would be like:
stats/publications/timeline/abstract
stats/publications/timeline/files
That would solve our naming clash with galley
and files
, so we can always use files
.
Also, I think we can simplify further by giving each object a default timeline. So the following:
stats/publications/timeline
Would provide the abstract timeline. Then other API endpoints could use query arguments:
stats/publications/timeline
stats/publications/timeline?type=files
So I'd see the following endpoints for monthly stats:
Endpoint | Function |
---|---|
stats/publications/timeline |
PublicationStats::getManyTimeline |
stats/publications/timeline?type=files |
PublicationStats::getManyFilesTimeline |
stats/publications/ID/timeline |
PublicationStats::getTimeline |
stats/publications/ID/timeline?type=files |
PublicationStats::getFilesTimeline |
stats/contexts/timeline |
ContextStats::getManyTimeline |
stats/contexts/ID/timeline |
ContextStats::getTimeline |
stats/issues/timeline (toc) |
IssueStats::getManyTimeline |
stats/issues/timeline?type=files |
IssueStats::getManyFilesTimeline |
stats/issues/ID/timeline (toc) |
IssueStats::getTimeline |
stats/issues/ID/timeline?type=files |
IssueStats::getFilesTimeline |
Does that sound good?
Hi @NateWr, yes that sounds good. I would have one question: shall we somehow separate full texts and supp files in those timeline calls for publications? -- now we consider the supp files (in list of objects and in one object stats), and I think the user would maybe like to have monthly stats (only) for full texts rather than supp files... Or shall timeline?type=files
only mean full text files?
I don't think we should make the distinction. The timeline should show all files.
If we later want to extend the endpoint to consider only some files, we could for example use /timeline?type=primaryFiles
or something like that. But I don't think we need to do that yet.
PRs for the export possibility of articles/monographs/preprints: pkp-lib: https://github.com/pkp/pkp-lib/pull/8308 ui-library: https://github.com/pkp/ui-library/pull/216 ojs: https://github.com/pkp/ojs/pull/3562 omp: https://github.com/pkp/omp/pull/1215 ops: https://github.com/pkp/ops/pull/365
@NateWr, could you please take a look at the PR above? It works, but eventually sorry for my clumsy solution of this vue/UI work... I am happy to improve it according to your comments! :-)
Is there a PR for UI Library that should be included? I'm getting JS errors related to missing methods and properties like downloadReport
that make me think there are changes to StatsPublicationPage.vue
I need to look at.
Also, do you have a (not too large) database dump you can send me with stats in it? I don't have stats in any of my local test instances.
Oh, sorry @NateWr :-( I forgot it :-P Now it is added in the PRs list above... (will update the submodule in OJS in sec).Thanks a lot!
Yep, working for me now. :) I left a couple of comments on the PRs. I also made a couple of commits with a rough idea of how to change the stats download modal. There's not quite enough information to know what I'm downloading, so I moved the report type selection into the modal. If you like it, you can use the commits here to work it into your setup (I didn't translate any of the text so there's still work to do):
In addition to that, a few other comments:
stats-<current-date-and-time>-<context-acronym>-articles-<date-range>-<section>-<section>.csv
.Thanks a lot @NateWr! Regarding monthly/daily reports: We decided that for now the user needs to select the month he/she would like to have the report for. I do not know any more what we said for the daily reports... :thinking: And for the other comments: I will consider them now...
Hi @NateWr, I think I considered all your comments. Could you please take another look? A few comments: Next the filter Issues will come, that's why I treat the filters that (generic) way... I use 'submissions' instead of 'articles', 'monographs' and 'preprints' for the file name -- in order not to translate it -- but, if wishes, I can change this...
The current stats API does not support daily stats for each article, only for the timeline i.e. for all articles...
The current stats API does not support daily stats for each article, only for the timeline i.e. for all articles...
I think the existing API supported this. But also, I think the general stats API /stats/publications/abstract
let's you filter by searchPhrase
, right? This means that someone could sort of achieve this by entering the exact title as the search?
I haven't changed that part of the API, so yes, this is still possible. This is for one article.
Yes, searchPhrase is possible for /stats/publications/timeline
.
But there is no possibility to have every article and its stats listed by day or month. Month we said could be solved by choosing month by month date range.
Ok, that should be good enough, as long as someone can retrieve a history of views/downloads for an article or a group of articles.
Looks good, @bozana. Just a few comments in the code. In addition, I had these comments:
PKPStatsPublicationHandler::_getFileReportColumnNames()
, two of the column names are reversed:ID,Title,"File Views","Article ID","Article Title"
Let's call it "Article Title" first and then at the end call it "Filename".
The search phrase doesn't seem to be working. When I search for a word in a submission title or its id, no results are returned.
Is there some way to download a timeline in CSV? Maybe we can split this to a separate issue, but I think we'll want that too, just to complete the replacement of the custom report generator. It could be another between articles and files that says:
Timeline The number of [article views|file downloads] for each [day|month] in this date range.
Hi @NateWr, I think I implemented all your comments. Regarding the search: the search is the general submission entity search and it works, it just does not work 'correctly' on my stats data -- in metrics tables I have submission IDs that are not published and search only considers the published submissions. In real data this is not the case -- the stats data exist only for published submissions. For the CSV export of the timeline I created a new issue: https://github.com/pkp/pkp-lib/issues/8328.
Great! I just checked the stats search and you're right: it worked for the one submission I had that was published. I'm happy for you to merge whenever you're ready. :+1:
PRs for Statistics > Issues page:
ui-library: https://github.com/pkp/ui-library/pull/217
ojs: https://github.com/pkp/ojs/pull/3571
pkp-lib: https://github.com/pkp/pkp-lib/pull/8376
https://github.com/pkp/pkp-lib/pull/8353 (old PR with the accidentally wrong branch name, used for the code review)
@NateWr, could you please take a look at these new PRs for Statistics > Issues page?
@NateWr, I have considered all your comments. Would you like to take another look? Thanks a lot!!! :pray:
EDIT: I have also added a commit in the pkp-lib that adds CC Attribution to DB-IP.com for Geo data. I have asked @asmecher and he confirmed that that would be a good place...
PRs for issues filter on the stats article page + issueIds param for stats API: pkp-lib: https://github.com/pkp/pkp-lib/pull/8358 ui-library: https://github.com/pkp/ui-library/pull/222 ojs: https://github.com/pkp/ojs/pull/3582 omp: https://github.com/pkp/omp/pull/1222 ops: https://github.com/pkp/ops/pull/374
@NateWr, here are the PRs for issues filtering on the stat article page. Also the additional stats API param issueIds. Could you please review it?
Hi @NateWr, I have considered your comments for issues filter PRs (https://github.com/pkp/pkp-lib/issues/7318#issuecomment-1287676815). Would you like to double check?
Maybe also here a reminder that I have also considered your comments on these PRs https://github.com/pkp/pkp-lib/issues/7318#issuecomment-1275103834 -- if you would like to double check...
@NateWr, I considered the second review for these PRs https://github.com/pkp/pkp-lib/issues/7318#issuecomment-1275103834. Could you please take a look? Thanks a lot!!!
Sorry @NateWr, me again: I added the tooltip for Geolocation, changed the download file name and added the parameters into the CSV file. Could you take a look? :pray:
Describe the problem you would like to solve The custom report generator (Statistics > Reports > Generate Custom Report) duplicates some of the filtering and sorting options in the article stats UI (Statistics > Articles). The custom report UI is confusing, and depending on which report template is used, offers advanced options that don't apply to the report.
Describe the solution you'd like Generating a custom report should be integrated with the article stats UI, so that when a journal manager is viewing the article statistics they generate a report from the date and section filters they have already selected.
The journal manager can click to generate a particular report, and then get a few additional options to configure the output they receive. Because the UI already includes tools to filter by date and daily/monthly, the options to configure a custom report would be much simpler. The JM would only need to select the columns they want to include.
Who is asking for this feature? Tell us what kind of users are requesting this feature. Example: Journal Editors, Journal Administrators, Technical Support, Authors, Reviewers, etc.
Additional information An inventory exercise was conducted to understand all of the requirements for statistics reports. That can be found at: https://pkp.notion.site/d3078b32275d4b8a98fe65d5b77d125e?v=188ebb82537c4b8997ddb82f86193477
TO-DOs: