scientist-softserv / britishlibrary

Other
3 stars 0 forks source link

10c IRUS counter Ruby gem to be integrated (req. 1.10c) #136

Closed crisr15 closed 1 year ago

crisr15 commented 1 year ago

Please integrate IRUS counting for each of the live repositories, including the NHS (not any of the demo tenants). This can be included as work on analytics which is the last milestone.

Tender context: 'We also need the repositories to work with IRUS-UK, the UK’s central provider of COUNTER-conformant usage data for repositories.'

CS/N8 response: 'This parameters and function of this requirement are well known to us from other platforms. This feature can be added using the irus_analytics gem developed specifically for this task. The gem is old so we will assess its readiness for inclusion with the latest version hyrax and it's compliance with current IRUS/Counter requirements.'

Paul Needham at Jisc (who run IRUS) spoke with Michigan about if the gem should work: 'most likely it is a yes as it is a ruby gem and works with Hyrax. Hyku's foundation is also Hyrax so it should work. It would be good to experiment and see how it works.'

crisr15 commented 1 year ago

https://github.com/JiscSD/irus_analytics

https://groups.google.com/g/samvera-community/c/wccgjGfsGMk?pli=1

crisr15 commented 1 year ago

this code is on staging. I clicked on a few different works, file sets and downloaded a few files too.

@cziaarm can you check and make sure these are tracking right wherever you all check for this?

crisr15 commented 1 year ago

@alishaevn I have confirmed with Paul Needham (who runs the IRUS service) that the requests are appearing as expected at the test IRUS server:

And . . . 19 entries from BL, including Requests and Investigations:

51.132.153.238  url_ver=Z39.88-2004    url_tim=2021-09-29T02%3A56%3A01Z       req_id=urn%3Aip%3A104.176.76.39 svc_dat=http%3A%2F%2Fmola.bl-staging.notch8.cloud%2Fconcern%2Ffile_sets%2F48c3b6ba-7ca7-47a0-9eb2-576f667b9dd9%3Flocale%3Den   rfr_dat=http%3A%2F%2Fmola.bl-staging.notch8.cloud%2Fconcern%2Farticles%2Fe1e9ccd9-d841-4cda-a4f2-483605245b21%3Flocale%3Den        rft.artnum=oai%3Ahyku%3A48c3b6ba-7ca7-47a0-9eb2-576f667b9dd9  rfr_id=iro.bl.uk        rft_dat=Investigation  req_dat=Mozilla%2F5.0+%28Macintosh%3B+Intel+Mac+OS+X+10.15%3B+rv%3A92.0%29+Gecko%2F20100101+Firefox%2F92.0

51.132.153.238  url_ver=Z39.88-2004    url_tim=2021-09-29T02%3A56%3A04Z       req_id=urn%3Aip%3A104.176.76.39 svc_dat=http%3A%2F%2Fmola.bl-staging.notch8.cloud%2Fdownloads%2F48c3b6ba-7ca7-47a0-9eb2-576f667b9dd9%3Flocale%3Den   rfr_dat=http%3A%2F%2Fmola.bl-staging.notch8.cloud%2Fconcern%2Ffile_sets%2F48c3b6ba-7ca7-47a0-9eb2-576f667b9dd9%3Flocale%3Den    rft.artnum=oai%3Ahyku%3A48c3b6ba-7ca7-47a0-9eb2-576f667b9dd9  rfr_id=iro.bl.uk        rft_dat=Request        req_dat=Mozilla%2F5.0+%28Macintosh%3B+Intel+Mac+OS+X+10.15%3B+rv%3A92.0%29+Gecko%2F20100101+Firefox%2F92.0

Looking good 😊

Cheers

Paul

This is good! but I'm double checking with Paul to see if we need to make it so the source_repository item (which arrives in the above as the rfr_id) needs to reflect the tenant sub-domian. My guess is that it will need to in order for IRUS to provide distinct stats for each organisation, but no harm in asking. I'll update as soon as he gets back to me.

crisr15 commented 1 year ago

Yep Paul confirmed that we'll need to get the tenant domains reflected in the source_repository. I'm at a disadvantage when it comes to how to extract hyku config settings, but I'll be glad to see how it is done. (It is probably worth noting that Sara told us yesterday that the BL do not consider IRUS as their top priority, so...)

crisr15 commented 1 year ago

if you are happy for this to go to production, that's fine with us. Don't think we can test it at all?

crisr15 commented 1 year ago

Rory McNicholl @cziaarm · 1 year ago Developer There is a little left to do on this :/ and I've not pushed as it was a little down the priority list. I will change status to ready for development and you can show us where it sits in relation to the other tasks on there...

The current issue: We are currently sending the rfr_id / source_repository as iro.bl.uk, this needs to reflect the sub-domains of the tenant repositories that are sending the request. We need to adjust the gem to EITHER allow source_repository to be a tenant level setting OR (probably more sensibly) just set the source_repository programmatically to whatever the tenant base_url is.

It is OK in the current state on production at the moment, Paul Needham et al are aware that we will be updating soon to ensure the proper domains are reflected in the data we are sending to IRUS. He's OK ignoring the requests until then

Sara Gould Sara Gould @SaraGould · 1 year ago Reporter Very good thank you.

Alisha Evans Alisha Evans @alishaevn · 11 months ago Maintainer I think this change needs to happen in config/irus_analytics_config.yml

crisr15 commented 1 year ago

Hello, I'm looking at https://github.com/notch8/irus_analytics/blob/master/lib/irus_analytics/configuration.rb#L35-L40 and wondering if Settings here and specifically Settings.hostname should be switched to Site.account.cname

Then config/irus_analytics_config.yml could be updated to look more like:

production:
  named_servers: true
  bl.iro.bl.uk:
    enabled: true
    enable_send_logger: false
    enable_send_investigations: true
    enable_send_requests: true
    irus_server_address: https://irus.jisc.ac.uk/counter/test/
    robots_file: irus_analytics_counter_robot_list.txt
    source_repository: bl.iro.bl.uk
    verbose_debug: false
  nms.iro.bl.uk:
    enabled: true
    enable_send_logger: false
    enable_send_investigations: true
    enable_send_requests: true
    irus_server_address: https://irus.jisc.ac.uk/counter/test/
    robots_file: irus_analytics_counter_robot_list.txt
    source_repository: nms.iro.bl.uk
    verbose_debug: false

and so on ?

crisr15 commented 1 year ago

labelling this as part of the migration finish-off (it could be considered app parity but we de-prioritised it until the most important work had been done)

crisr15 commented 1 year ago

@Jenny by way of client QA, I've made you a video

https://share.getcloudapp.com/bLu5JBon[@Jenny](https://github.com/Jenny) by way of client QA, I've made you a video https://share.getcloudapp.com/bLu5JBon

crisr15 commented 1 year ago

thanks rory, happy for it to be deployed

crisr15 commented 1 year ago

Emailed IRUS 8/9/22 to check

crisr15 commented 1 year ago

IRUS report 8.9.22: The good news: yes, all of those repositories are sending data to us.

The bad news: all the tracker messages show an internal IP (192.168.252.87) rather than Client_IPs in the req_id element, which needs to be addressed before we can ingest the IRO data.

I think it’s just a simple configuration issue, i.e. IP forwarding needs to be in place?

crisr15 commented 1 year ago

Confirmed that we can make an env variable that is the load balancer IP and then feed the gem that environment variable.

crisr15 commented 1 year ago

update from paul at irus

I’ve checked and, while there has been a change, all of the ‘client IPs’ are now showing as 52.210.71.93, which is an Amazon Data Services IP address :frowning2:, so some further config on IP forwarding is needed.

Developer

Not sure what else Paul was expecting from an AWS hosted service.

Author Reporter

is he expecting it to show different for each tenant?

Developer

No not that. It is meant to be the requester IP i.e. the IP of the person who sent the request to iro... that will be something of job, I've been able to get this through to the application pod in the past, but that entailed some wrong-headed kubernetes config which I wanted to avoid in this case. Let me check and see what our options are.

Author Reporter

ok lovely i will leave it in in dev for you to ponder thank you

crisr15 commented 1 year ago

turns out I was not paying attention and the value that IRUS request for req_id is the requester ID (i.e. the IP of the client making the request).

I have made this work before but that involved setting externalTrafficPolicy to Local at the load balancer (then the original IRUS code should be able to access the correct source IP). However I know there are a few more hops with the BL set up so I think I'll need to defer April on the correct way to achieve this first time.

cziaarm commented 1 year ago

My IP is 81.174.241.126

Image

j-basford commented 1 year ago

Emailed Paul Needham at Jisc 14/11/22 for feedback