ucsdlib / damsmanager

DAMS Manager
Other
3 stars 1 forks source link

Curator vs non-curator views in RDCP statistics reporting #110

Closed mcritchlow closed 7 years ago

mcritchlow commented 7 years ago

QA/QC Testing: @hjsyoo

lsitu commented 7 years ago

@mcritchlow @hjsyoo For # 2 "In curator view, I backspaced once, then clicked back onto the page from search results. Then, in non-curator view, I repeated this action. Expected to see curator views = 1 and non-curator views = 1", we are counting on the page requests to the server, so if the browser's cache is timeout or disable, it will trigger another request to the page which will be counted as another view. Since we don't know when the curator will hit the back button which trigger another page load, should we have some rules to eliminate those kind of counts? Something like one view for one object view per day from an IP?

lsitu commented 7 years ago

@hjsyoo I think I've found the issue with the curator view that are not counted but I am curious that why it was once working in the past. Could you confirm that from the search result page in the curator view, when you click on the thumbnail link on the left, it will be counted as curator view, while click on the link with object title and description metadata below it, it won't be counted as curator view at all but non-curator view? Thanks.

lsitu commented 7 years ago

@mcritchlow I've added two PRs (https://github.com/ucsdlib/damspas/issues/256, https://github.com/ucsdlib/damsmanager/issues/111) to address the first and third issues above. I think we may need to discuss the second issue regarding how to count the the number of views. See https://github.com/ucsdlib/damsmanager/issues/110#issuecomment-260457180 . What do you think?

mcritchlow commented 7 years ago

@lsitu great thanks 👍 . #111 looks good to me, and we'll let the team review/merge.

As to the second issue, I agree I think a discussion (either here or in a separate mtg) might need to happen regarding what's realistic to implement on our end. @gamontoya and @hjsyoo let us know what you think.

hjsyoo commented 7 years ago

Hi @lsitu. Below are the steps I took to test the second issue, as you requested. Last night, I identified 2 objects that haven't been viewed this month, Area T (bb0684080d) and Locus 44 (bb4131211t). I opened Firefox in Private mode. In search results, Curator view, I clicked on thumbnail link to open the Area T page, counted to 10, then clicked back to search results. Clicked thumbnail again, counted to 10, then clicked back. Repeated once more, so I had viewed Area T three times in one session. Closed browser. Next I repeated the above steps for Locus 44, but clicked the title each time rather than the thumbnail. This morning, Area T is reported to have been viewed 3 times by non-curator and 3 times by curator. Locus 44 was viewed 6 times by curator.

hjsyoo commented 7 years ago

@lsitu , regarding your other question, "Since we don't know when the curator will hit the back button which trigger another page load, should we have some rules to eliminate those kind of counts? Something like one view for one object view per day from an IP?" This seems reasonable. One object view per day is all I feel we need to track. Is this how google analytics defines unique views? But I can't address whether IP is enough to track individual users if they're on public wifi? Let me know if I missed any other issues.

lsitu commented 7 years ago

@hjsyoo Thanks for testing on it. It looks like Locus 44 should be all counted on public view but not curator view. For your questions, I am not sure how google analytics defines unique views. But I think we can count it on curator base, one count for one user per day. Does it sounds good?

hjsyoo commented 7 years ago

@lsitu , The stats report was confusing to me. I would have expected both Area T and Locus 44 to report just 3 views by curator (I only tested in curator mode). Instead, Area T is reporting 3 extra, unexpected non-curator views and Locus 44 reports 3 extra curator views. Assuming no one else viewed these objects yesterday (perhaps an incorrect assumption), it looks like views are getting duplicated? Yes, one count per user per day sounds good!

lsitu commented 7 years ago

@hjsyoo It looks like something weird that is related to the back button tests you are performing. I think we can examine it to see how that happened. Will it be counted twice if you just click the thumbnail link once without clicking back? For Locus 44, I see total 6 public views reported but not curator (curator access is displayed at the last column, which is 0): Khirbat en-Nahas Project (Jordan) Locus 44, Area A, Area Stratum Ib, FILL bb4131211t 0 0 0 0 8 0 2 0 0 0 6 0

For unique access, how should we count the non-curator/public view which don't have a user ID?

hjsyoo commented 7 years ago

@lsitu - Oh, you're right, I misread the stats columns. So for Locus 44 (i.e., clicking on the title), curator views get incorrectly logged as non-curator (public) views. That happens to be my typical behavior - I click on titles, not thumbnails. I will do the single click test today on these same objects, and we can see how DAMS manager interprets them tomorrow.

hjsyoo commented 7 years ago

@lsitu To log my behavior for today, I opened a tab on Chrome (not incognito) and browsed to the Khirbat en Nahas collection. From search results, I clicked on the thumbnail for Area T. After 10 secs on the landing page, I closed the tab. I opened a new tab and repeated this sequence, except that I clicked on the title for Locus 44.

Regarding unique access, it might be that we can't count non-curator views, for which there is no user ID on wifi. I believe this is what @mcritchlow has told me, but he can verify. If this is the case, the only solution may be to remove any labels on DAMS manager that imply "unique" views?

lsitu commented 7 years ago

@hjsyoo Yes, I can remove the "unique" keyword on the status report page. For that double counted issue, I think it should be caused by the link we clicked got redirected to the page link. I'll see how to eliminate that being counted. One simple way is just count those access from the search result page. Does it sound correct per your view?

hjsyoo commented 7 years ago

@lsitu Not sure what you mean by the last two sentences - do you mean that we would ignore user views if they came in directly from a bookmark or from google search?

lsitu commented 7 years ago

@hjsyoo Yes, I think we should ignore those access (views) being redirected. But I need to count those coming from the bookmark for curator access. So it looks like counting all the access (views) and ignore those being redirected (access from bookmark) sounds correct, does it?

hjsyoo commented 7 years ago

@lsitu If I'm understanding the problem correctly, I'd like to see the redirect counts be interpreted correctly as single hits and not double. But if a public user bookmarks a page and clicks on the bookmark or encounters the object in a google search, I would certainly want to see that visit being counted. My hope here is to be able to assess how often users are viewing the page, regardless of how they got there. Is this possible?

lsitu commented 7 years ago

@hjsyoo Yes, I think we just need to eliminate the redirect counts. If this is not the exact behavior you want, then we can adjust it later.

lsitu commented 7 years ago

@mcritchlow I've added another commit to PR https://github.com/ucsdlib/damsmanager/pull/111 to eliminate the double counted object views from the search result page that are redirected. It's ready for review now. Thanks.

hjsyoo commented 7 years ago

@lsitu To follow up on my UI test yesterday, DAMS manager reports that Area T (thumbnail clicked) was viewed non-curator=4 and curator=4. Locus 44 (title clicked) was viewed non-curator=8. So, same duplication issue has occurred, even without use of the backspace.

lsitu commented 7 years ago

@hjsyoo Yes. This is caused by view page redirect and it's fixed with PR https://github.com/ucsdlib/damsmanager/pull/111 . We have another PR https://github.com/ucsdlib/damspas/pull/256 that still need to be merged. I'll let you know once it's ready for you to test. Thanks.

lsitu commented 7 years ago

@hjsyoo Would you like to test it on QA? I've set it up on QA (https://libraryqa.ucsd.edu/dc) at this time, and you can review the result at QA damsmanager the next day: https://libraryqa.ucsd.edu/damsmanager/statsRdcpUsage.do Thanks.

hjsyoo commented 7 years ago

@lsitu It worked on libary QA! One click on thumbnail or title resulted in a report of 1 curator view, and 3 clicks on thumbnail or title resulted in a report of 3 curator views. 0 non-curator views were reported. So this is all expected behavior. Should I test further as non-curator? And should I test stats reports for when I use backspace or enter from a google search? For reference, these are my test behaviors:

  1. In VPN, I opened a regular tab in Chrome and browsed to the Khirbat en Nahas collection as Curator. From search results, I clicked on the thumbnail for Locus 647, Area M, Area Stratum M3, SLAG LAYER (bb0376912h). After 10 secs on the landing page, I closed the browser.
    • On 2016-11-28, DM reports non-curator=0, curator=1
  2. I repeated these steps for Locus 629, Area M, Area Stratum M2a/2b, SLAG LAYER (bb1912764h), except that I clicked back arrow then thumbnail (waiting 10 secs) twice, for a total of 3 visits to that page.
    • On 2016-11-28, DM reports non-curator=0, curator=3
  3. I repeated step 1 for Locus 509, Area M, Area Stratum 1b (bb0684084g), except that I clicked the title link.
    • On 2016-11-28, DM reports non-curator=0, curator=1
  4. I repeated step 2 for Locus 270, Area S, Area Stratum I, WALL (bb63496611), except that I clicked the title link.
    • On 2016-11-28, DM reports non-curator=0, curator=3
lsitu commented 7 years ago

@hjsyoo That sounds good! Thank you very much for testing it out on QA. We are going to deploy damsmanager to staging now so I think you can test it on staging later. Thanks.

lsitu commented 7 years ago

@hjsyoo The required codes changes for damsmanager and damspas are deployed to staging and it's ready for review now. You can test it on https://librarytest.ucsd.edu/dc and review the RDCP stats result the next day on https://librarytest.ucsd.edu/damsmanager/statsRdcpUsage.do . Thanks.

hjsyoo commented 7 years ago

@lsitu Yesterday, I conducted the same test on staging, as I did on QA (i.e., Loci 647, 629, 509, 270). But DM doesn't seem to have detected any activity, as of today. I can't even find the records for these objects, in https://librarytest.ucsd.edu/damsmanager/statsRdcpUsage.do. Is there a chance something didn't work on the code end, before I test again today? I suppose there's a chance I tested in the wrong environment, but I'm not seeing the proper hits on https://library.ucsd.edu/damsmanager/statsRdcpUsage.do or https://libraryqa.ucsd.edu/damsmanager/statsRdcpUsage.do, either.

lsitu commented 7 years ago

@hjsyoo Do you have the arks that you tested with yesterday? Thanks.

lsitu commented 7 years ago

@hjsyoo Did you only tested with public access as well? Since curator access is only serving as comparing purpose, it won't show up until a public access is triggered. Could you try a test with both public access and curator access if that object hasn't shown up in the stats report (https://librarytest.ucsd.edu/damsmanager/statsRdcpUsage.do) yet?

hjsyoo commented 7 years ago

@lsitu I see. I'll test with both levels of access today, as you suggest. Here are the ARKs: bb0376912h bb1912764h bb0684084g bb63496611

lsitu commented 7 years ago

Hi Ho Jung, I check yesterday's Apache log and see lots of access to those objects above on QA but non on staging. Please double check that you are using damspas on staging ( http://librarytest.ucsd.edu/dc/ ) when performimg the test today. Thanks.

hjsyoo commented 7 years ago

@lsitu Just finished testing on librarytest! We'll see what happens tomorrow. Yes, it's entirely possible I did all the testing on libraryqa yesterday. Today, I visited all 4 objects as non-curator first, to trigger DM.

hjsyoo commented 7 years ago

@lsitu Actually, I take it back. I can't have tested as curator on qa yesterday, because the curator stats on qa didn't increment as they should have, if that were the case. I did, however, test the same 4 objects on qa as non-curator.

lsitu commented 7 years ago

@hjsyoo I see some accesses on today's log for the first two objects above on staging now. So it looks looks good. Thanks.

hjsyoo commented 7 years ago

@lsitu Good news - my tests on staging came out as expected for all but one object. Looking at the stats for that object, however, it's clear that human error was involved - I must've failed to sign in as curator on one of the days of testing. I'll do some more testing later in the month, on production, and let you know if I encounter anything else. Thanks, Longshou!

lsitu commented 7 years ago

That sounds good. Thanks @hjsyoo .

gamontoya commented 7 years ago

@lsitu :+1:

hjsyoo commented 7 years ago

Hi @lsitu - I'm curious - for our tests on staging, we found that reporting was triggered by non-curator views, not by curator views. So, no views were reported on an object that was only viewed by curator during that month. But I'm looking at stats for prod (https://library.ucsd.edu/damsmanager/statsRdcpUsage.do), and finding some reports for Dec that are non-curator=0, curator=1. Does prod behave differently from staging? Is it the case that we don't need non-curator views in order to trigger reports? image

lsitu commented 7 years ago

@hjsyoo It's actually triggered by any public access during the RDCP Stats history. So if you look at the months earlier, you'll see that object at least has one count for public access.

hjsyoo commented 7 years ago

@lsitu Oh, right - that makes sense. Thanks!

mcritchlow commented 7 years ago

@lsitu @hjsyoo - Is this work still in progress?

lsitu commented 7 years ago

I think we've done with the ticket and we can close it now. What do you think, @hjsyoo ?

hjsyoo commented 7 years ago

@mcritchlow @lsitu Yes, I think all is good. Is the protocol in github the same as in Jira - should I normally close the ticket?

lsitu commented 7 years ago

@hjsyoo I think normally the ticket will be marked as closed once the PR is merged and we just need to drag it to the "Closed" column at https://github.com/orgs/ucsdlib/projects/1. Either you or we can do it once you confirm on it. Thanks.

hjsyoo commented 7 years ago

@lsitu Ok, I'll close this ticket tomorrow. I don't recall whether I did a final test on production before the break, so I'm doing so today and tomorrow. Thanks.

hjsyoo commented 7 years ago

All went as expected! Thanks @lsitu.

lsitu commented 7 years ago

Thanks for testing it out @hjsyoo .