Closed JBorrow closed 9 months ago
I think there is a more general issue here, and this part of the code could use a little refactoring.
There are essentially a few ways that we want to interact with the store:
At the moment this is all essentially bundled up into the BaseStore
class (later subclassed into Store
, but that really just adds in the database ORM). Hence the BaseStore
class makes a couple of tricky of assumptions about the way that stores should be interacted with:
and also means that any transfer protocols must be implemented as methods on the class.
This is tricky because:
Perhaps it makes sense to refactor into something that looks close to the following peseudocode, taking advantage of dependency injection. We want to abstract away the concept of a store and a transfer (which is not really currently the case because of its monolithic nature). At the moment the globus destination endpoint ID is stored in the config, not by the host librarian server, for instance.
# Running on librarian server
class StoreMetadata(db.Model)
""" Core metadata for store management (e.g. hostname, globus endpoint ID, etc.) """
#... Stuff that registers the store in the Librarian database
class CoreStoreManager:
""" Prototype for store management (e.g. moving staged files, finding storage). """
def move(from, to):
#...
class POSIXStoreManager(CoreStoreManager):
""" Example of a store manager that is 'used' by the librarian server """
def move(from, to):
#... ssh self.ssh_host; mv from to
class LocalPOSIXStoreManager(CoreStoreManager):
""" ... another manager """
def move(from, to):
#... os.move(from, to)
# Running on the client
class TransferManager:
""" Prototype for transfer management """
def transfer(local_path, remote_path=None):
""" Remote path can be pre-staged """
#...
class RsyncTransferManager(TransferManager):
""" Example of a transfer manager using rysnc """
def __init__(self, host_name, ...):
self.host_name = host_name
def transfer(local_path, remote_path=None):
#... rsync loac_path self.host_name:remote_path
class GlobusTransferManager(TransferManager):
""" Example of a transfer manager using globus """
def __init__(self, destination_endpoint_id, ...);
self.destination_endpoint_id = destination_endpoint_id
def transfer(local_path, remote_path=None):
#... globus_sdk.initiate_transfer(local_path, remote_path)...
This would allow a flow as follows:
# Server
stores = StoreMetadata(...)
best_store: CoreStoreManager = stores.find_store_with_available(bytes=...)
staging_location: str = best_store.stage(bytes=...)
best_store.post_transfer_information(staging_location, ...)
best_store.process_staged_file(...)
# Client
client = LibrarianClient(...)
transfer_manager: TransferManager = client.get_store_for(bytes=...)
transfer_manager.transfer(local_path)
In this way the transfer (e.g. globus, rsync, cp for local transfers, etc) is completely transparent.
Playing around with some ideas in https://github.com/JBorrow/librarian/tree/local-storage
An additional request that we have is to add timeouts for various librarian services so they do not hang.
@JBorrow thanks for pushing on this! Yes, I think you're right about the assumptions/assertions that the Librarian makes. It is true that the Librarian assumes a POSIX filesystem, and many of the data-movement commands are essentially hard-coded to assume a Linux-like environment. (For example, figuring out how much storage is available is done over SSH and assumes df
behaves as the coreutils
version in terms of options.)
I think that your proposed refactor is a good one, especially given that SO:UK sounds like it is not a POSIX filesystem and will need site-specific handling of file routing and transfer to/from that site.
I think it might be most efficient to chat offline about how best to implement these changes. Thanks again for your help!
@plaplant Excellent, maybe it makes sense to meet some time early in the new year? I have been making lots of changes to the librarian in that linked branch... Maybe too many... Hopefully by the time we meet I will have a working full prototype. When I started changing the underlying ssh-based architechture, it made sense to make a few other changes, too.
Completed and ongoing tasks:
cp
)I'm checking out on this project until the new year, but when I get back I will start by integration testing a bunch of stuff (primarily shares between librarians).
Development is coming along. I noticed that unfortunately the previous librarian server had close to zero test coverage, so that's a lot more work from scratch than I had anticipated. I have quite a nice test framework now where we spin up real test servers using pytest fixtures for testing the client, and thankfully the new FastAPI based endpoints are easy to unit test.
Coverage is coming up:
platform darwin -- Python 3.11.5, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/borrow-adm/Documents/SimonsObs/librarian
configfile: pyproject.toml
testpaths: tests/integration_test, tests/server_unit_test
plugins: xprocess-0.23.0, cov-4.1.0, testmon-2.1.0, anyio-3.7.1
collected 24 items
tests/integration_test/test_configuration.py ... [ 12%]
tests/integration_test/test_upload_file.py .. [ 20%]
tests/server_unit_test/test_clone.py .......... [ 62%]
tests/server_unit_test/test_ping.py . [ 66%]
tests/server_unit_test/test_upload.py ........ [100%]
======================= server unit test temporary files =======================
[1mDatabase: [0m/private/var/folders/k1/fld2m5s54fd9nd8qy_wdtvf00000gp/T/pytest-of-borrow-adm/pytest-188/server_12810/database_1281.sqlite
======================= integration test temporary files =======================
[1mServer log: [0m/Users/borrow-adm/Documents/SimonsObs/librarian/.pytest_cache/d/.xprocess/server/xprocess.log
[1mDatabase: [0m/private/var/folders/k1/fld2m5s54fd9nd8qy_wdtvf00000gp/T/pytest-of-borrow-adm/pytest-188/server_154190/database_15419.sqlite
---------- coverage: platform darwin, python 3.11.5-final-0 ----------
Name Stmts Miss Cover
-------------------------------------------------------------
hera_librarian/__init__.py 165 80 52%
hera_librarian/base_store.py 174 174 0%
hera_librarian/cli.py 469 469 0%
hera_librarian/models/__init__.py 0 0 100%
hera_librarian/models/clone.py 82 0 100%
hera_librarian/models/ping.py 9 0 100%
hera_librarian/models/stores.py 3 3 0%
hera_librarian/models/uploads.py 61 0 100%
hera_librarian/tests/__init__.py 9 9 0%
hera_librarian/tests/test_base_store.py 108 108 0%
hera_librarian/tests/test_cli.py 82 82 0%
hera_librarian/tests/test_utils.py 86 86 0%
hera_librarian/transfers/__init__.py 6 0 100%
hera_librarian/transfers/core.py 7 2 71%
hera_librarian/transfers/local.py 14 0 100%
hera_librarian/utils.py 184 147 20%
librarian_background/__init__.py 15 15 0%
librarian_background/bad.py 5 5 0%
librarian_background/check_integrity.py 43 43 0%
librarian_background/core.py 17 17 0%
librarian_background/create_clone.py 86 86 0%
librarian_background/poll.py 6 6 0%
librarian_background/recieve_clone.py 44 44 0%
librarian_background/send_clone.py 91 91 0%
librarian_background/task.py 13 13 0%
librarian_server/__init__.py 13 0 100%
librarian_server/api/__init__.py 3 0 100%
librarian_server/api/clone.py 108 3 97%
librarian_server/api/ping.py 11 0 100%
librarian_server/api/upload.py 87 7 92%
librarian_server/api/util.py 32 32 0%
librarian_server/database.py 13 0 100%
librarian_server/deletion.py 14 1 93%
librarian_server/logger.py 5 0 100%
librarian_server/orm/__init__.py 5 0 100%
librarian_server/orm/file.py 31 1 97%
librarian_server/orm/instance.py 49 1 98%
librarian_server/orm/librarian.py 34 11 68%
librarian_server/orm/storemetadata.py 100 13 87%
librarian_server/orm/transfer.py 132 23 83%
librarian_server/search.py 647 647 0%
librarian_server/settings.py 35 1 97%
librarian_server/store.py 297 297 0%
librarian_server/stores/__init__.py 6 0 100%
librarian_server/stores/core.py 36 9 75%
librarian_server/stores/local.py 70 12 83%
librarian_server/stores/pathinfo.py 13 0 100%
librarian_server/webutil.py 207 181 13%
-------------------------------------------------------------
TOTAL 3727 2719 27%
============================== 24 passed in 3.77s ==============================
Coverage only appears so low because of a lack of tests for old deprecated code. The new api endpoints are close to 100% coverage, and the new client also has very high coverage. The background tests will be getting implemented soon.
I am closing this issue as we work towards the Deployment milestone.
Starting this tracking issue as we try to remove the
ssh
requirement from wherever possible in Librarian to make things NERSC-etc. friendly.@plaplant: It is my understanding that all Stores must be accessible over SSH. All inter-Librarian communication occurs from Store-Store (not Librarian to Librarian?). At the site, we have the Store url as
localhost
, so we ssh into ourselves and in theory launch globus transfers from there.I would like to sub-class
BaseStore
to createLocalStore
where all file operations are local (i.e. basically remove_ssh_slurp
).A couple of quick questions: are my assertions right, and if so is there a reason that all Stores must be remote? Why is there no current option for local storage at the position that the Librarian server actually lives?