Open ndushay opened 1 year ago
For what it is worth, sdr client is nothing like our other clients in that it is not simply a thin layer over the API. It includes much more logic to (1) fill in the request with defaults / construct the request from incomplete information and (2) handle the complexities of the ActiveStorage interaction. This is very confusing and it is quite possible that this code is over-generalized / over-abstracted, but I think the underlying task is not simple.
It would be helpful to know more about the goals / intent of this ticket.
@justinlittman the intent is to make it understandable; I didn't understand the CLI stuff at all. Maybe that's a documentation problem. Maybe that's a "what are we doing with the sdr-api and why do we use it in some contexts and DSA in others" question, too. I think of myself as a reasonable "middle of the pack" canary with respect to "I can understand what this code is doing" ... and sdr-client made my head hurt.
from the README:
A primary design goal was for this to have as few dependencies as possible so that it can be easily distributed by gem install sdr-client and then it can be used as a CLI.
... where is it used for this? And should we keep supporting this?
I wonder if this should be an issue or a dev planning topic or a story time or ... ?
@ndushay 💬
@justinlittman the intent is to make it understandable; I didn't understand the CLI stuff at all. Maybe that's a documentation problem.
There is CLI documentation in the README: https://github.com/sul-dlss/sdr-client/#usage Maybe this misses the mark, though.
A primary design goal was for this to have as few dependencies as possible so that it can be easily distributed by gem install sdr-client and then it can be used as a CLI.
... where is it used for this? And should we keep supporting this?
We developed it so that we'd be prepared to meet the needs of campus users wanting direct SDR access (BLN & OpenNeuro were both suggested as potential partners). But the CLI's usage has been limited. I used it for a couple remediations---Catalhoyuk was the big one---and perhaps Andrew or Arcadia have used it as well?
FWIW, I do find sdr-client hard to reason about but the CLI isn't the part that I find most difficult. It's the fact that the client exposes all of its guts and expects the consumer to stitch together an interaction that makes sense. I do believe it was developed this way intentionally. But I also think it might be worth the time and effort to pause and do some analysis on how sdr-client is being used in hopes of perhaps improving the structure and maintainability of the code. Is that worth work cycle time now? I don't know.
Until we sort this out, I'm unassigning myself and backlogging this.
I use it fairly regularly, but mostly for getting Cocina. The client started out with pretty limited options that made it only marginally useful for testing and not useful for anything I did with production data. Even though the intent is for external users to make use of it, the message I got in workcycles and in planning for workcycles was that it wasn't a priority to develop it out that way. It may cover more of my needs now but I haven't had a chance to really put it to use. I have at least one issue to file related to my recent attempts to use it for test data because it still doesn't appear to fully handle access rights when depositing.
I used the Fedora 3 API a lot when we were on Fedora 3 and likely would use the SDR API for similar tasks. But for various reasons, including system architecture that isn't related to the API or client itself, it hasn't worked out that way yet.
SdrClient::BackgroundJobResults.show
(with a job ID)SdrClient::Deposit.model_run
(with a request DRO and files)SdrClient::Login.run
(with a URL and username/password)SdrClient::BackgroundJobResults.show
SdrClient::Credentials.write
SdrClient::Deposit.run
(with apo, collection, type, strategies, files, path)SdrClient::Deposit::ImageFileSetStrategy
SdrClient::Deposit::MatchingFileGroupingStrategy
SdrClient::Deposit::SingleFileGroupingStrategy
SdrClient::BackgroundJobResults.show
SdrClient::Connection.new
(replace with proxy authN method)SdrClient::Credentials.read
(should no longer be needed since new authN method + no passing connections around)SdrClient::Deposit::CreateResource.run
(with accession bool, metadata as request DRO)SdrClient::Deposit::FileMetadataBuilderOperations::MD5.for
(with filepath)SdrClient::Deposit::FileMetadataBuilderOperations::SHA1.for
(with filepath)SdrClient::Deposit::Files::DirectUploadRequest.new
(with checksum, bytesize, content type, filename)SdrClient::Deposit::UpdateDroWithFileIdentifiers.update
(with request DRO and output from UploadFiles.upload)SdrClient::Deposit::UploadFiles.upload
(with file metadata, filepath map)SdrClient::Login.run
SdrClient::Connection.new
SdrClient::Deposit::CreateResource.run
(with same as above + assign_doi bool)SdrClient::Deposit::Files::DirectUploadRequest.new
SdrClient::Deposit::UpdateDroWithFileIdentifiers.update
SdrClient::Deposit::UpdateResource.run
(with request DRO and version description)SdrClient::Deposit::UploadFiles.upload
SdrClient::Find.run
(with druid)SdrClient::Find::Failed
SdrClient::Login.run
TO-DO
SdrClient::Deposit.model_run
(with a request DRO and files)... just wraps UploadFilesMetadataBuilder
, UploadFiles
, UpdateDroWithFileIdentifiers
, and CreateResource
SdrClient::Deposit::CreateResource.run
(with accession bool, metadata as request DRO, + assign_doi bool optionally)SdrClient::Deposit::UpdateDroWithFileIdentifiers.update
(with request DRO and output from UploadFiles.upload
)SdrClient::Deposit::UpdateResource.run
(with request DRO and version description)SdrClient::Deposit.run
(with apo, collection, type, strategies, files, path)SdrClient::Deposit::Files::DirectUploadRequest.new
(with checksum, bytesize, content type, filename)SdrClient::Deposit::UploadFiles.upload
(with file metadata, filepath map)SdrClient::Find.run
(with druid)SdrClient::Login.run
(with a URL and username/password)SdrClient::Find::Failed
SdrClient::Deposit::ImageFileSetStrategy
SdrClient::Deposit::MatchingFileGroupingStrategy
SdrClient::Deposit::SingleFileGroupingStrategy
SdrClient::RedesignedClient
SdrClient::Login.run
with SdrClient::RedesignedClient.configure
that takes a URL and either a username/password pair or a token
Connection.new
jazz with new method for proxy authN: given a sunetid, return a token
SdrClient::Find.run
behind a top-level methodSdrClient::Deposit::UploadFiles.upload
, SdrClient::Deposit::UpdateDroWithFileIdentifiers.update
, and SdrClient::CreateResource.run
(replace with SdrClient::RedesignedClient.deposit_model
)
SdrClient::Deposit::CreateResource.run
with SdrClient::RedesignedClient.deposit_model(model:, accession: true|false, assign_doi: true|false, priority: low|default, files: [])
that wraps the above sequence of interactionsSdrClient::Deposit.model_run
and SdrClient::Deposit::CreateResource
with SdrClient::RedesignedClient.deposit_model(model:, files:, accession:, assign_doi:, priority:)
SdrClient::Deposit.run
with SdrClient::RedesignedClient.deposit_metadata(apo:, collection:, type:, strategies:, files:, path:)
SdrClient::Deposit::UpdateResource.run
with SdrClient::RedesignedClient.update_model(model:, version_description:)
SdrClient::Deposit::Files::DirectUploadRequest
instances, and look into using hashes insteadCan we shorten the new module name to, maybe SdrClient::Singleton
or SdrClient2024
?
Or, consider making a new gem and retiring the old. SdrApiClient
?
I like all your other suggestions.
@ndushay Happy to take suggestions on the new module name. I think if the team is on-board with the changes above, we ideally wouldn't be using the new namespace for very long: I was thinking we'd "promote" the new namespace back up in SdrClient
wiping out all of the old code once all consumers have been moved to use the new interface. So the name only needs to be as good as something we'll ideally be seeing for a short while.
For the cutover process, I was thinking this would be the least labor-intensive (doesn't require Gemfile changes, or gem publishing shenanigans, unlike putting this work in a new gem).
Thanks for taking a look and dropping your thoughts on the issue!
I guess I think of touching all the calls in all the consumers twice as being more work than creating a new gem and touching them all once.
next step: discuss at dev planning?
@ndushay 💬
I guess I think of touching all the calls in all the consumers twice as being more work than creating a new gem and touching them all once.
Some additional context to consider:
More analysis notes:
# ANALYSIS
# --------
# UFMB
# given: array of relative file paths, mime types hash, and basepath
# returns: hash with file paths and DURs
# FMB
# given: array of relative file paths, files_metadata, and basepath
# returns: hash with relative file paths and files metadata (with md5, sha1, and mime types injected)
# MB
# given: request cocina, grouping strat, FS strat, upload responses
# returns: request cocina with FS/files built out
# UDWFI
# given: request cocina, upload responses from UF
# returns: request cocina (with uploaded file IDs)
# Request
# given: type md, desc md, access md, identity md, files metadata from FMB
# returns: request cocina
# Process:
# given: request cocina, array of relative file paths, basepath, strategies
# Uses MetadataBuilder to deal with the strategies
# Process: UFMB, UF, MB, CR
# NewDeposit: UFMB, UF, UDWFI, CR
# OldDeposit/NewMetadata: FMB, Request, Process (UFMB, UF, MB, CR)
Deposit
class allowing clients to inject a custom filepath_map
parameter. Why? We need this to deal with situations where prepending the given base path to the given filename (in Deposit.deposit_model
) is insufficient to retrieve the file. This is the case in H2 where files are managed by ActiveStorage, which uses a content-addressable storage approach to munging filenames.
Oh, the high-level function Deposit.deposit_model doesn't handle the "files in Active Storage" use case since they're not neatly arranged on disk. We might want to think about that, as I suspect we may have multiple applications with a similar use case. https://github.com/sul-dlss-labs/SHROOM/blob/main/app/services/sdr/deposit_service.rb
RedesignedClient
module to the root namespace
In doing research for the Folio Integration workcycle, I was chasing call stacks from the bottom up, and when I hit sdr-client I could't follow a large part of it.
The sdr-client gem, in a perfect world, could be refactored to be more like our other client gems (dor-services-client, preservation-client, folio-client ... whatever).
This ticket is to assess what it would take to do this refactoring and if we could do it in stages, or if it should be done all at once.
On consulting with @mjgiarlo, he said it would take approx 1 hour to do this assessment, and he would be delighted to do so after this week, when he is FR.
If it is just as much work to DO the refactor as to ASSESS the refactor, then ... please do it.