simonsobs / librarian

The HERA Librarian.
BSD 2-Clause "Simplified" License
1 stars 3 forks source link

Tracking issue for local storage #9

Closed JBorrow closed 9 months ago

JBorrow commented 11 months ago

Starting this tracking issue as we try to remove the ssh requirement from wherever possible in Librarian to make things NERSC-etc. friendly.

@plaplant: It is my understanding that all Stores must be accessible over SSH. All inter-Librarian communication occurs from Store-Store (not Librarian to Librarian?). At the site, we have the Store url as localhost, so we ssh into ourselves and in theory launch globus transfers from there.

I would like to sub-class BaseStore to create LocalStore where all file operations are local (i.e. basically remove _ssh_slurp).

A couple of quick questions: are my assertions right, and if so is there a reason that all Stores must be remote? Why is there no current option for local storage at the position that the Librarian server actually lives?

JBorrow commented 11 months ago

I think there is a more general issue here, and this part of the code could use a little refactoring.

There are essentially a few ways that we want to interact with the store:

At the moment this is all essentially bundled up into the BaseStore class (later subclassed into Store, but that really just adds in the database ORM). Hence the BaseStore class makes a couple of tricky of assumptions about the way that stores should be interacted with:

and also means that any transfer protocols must be implemented as methods on the class.

This is tricky because:

Perhaps it makes sense to refactor into something that looks close to the following peseudocode, taking advantage of dependency injection. We want to abstract away the concept of a store and a transfer (which is not really currently the case because of its monolithic nature). At the moment the globus destination endpoint ID is stored in the config, not by the host librarian server, for instance.

# Running on librarian server
class StoreMetadata(db.Model)
  """ Core metadata for store management (e.g. hostname, globus endpoint ID, etc.) """
  #... Stuff that registers the store in the Librarian database

class CoreStoreManager:
  """ Prototype for store management (e.g. moving staged files, finding storage). """
  def move(from, to):
    #...

class POSIXStoreManager(CoreStoreManager):
  """ Example of a store manager that is 'used' by the librarian server """
  def move(from, to):
    #... ssh self.ssh_host; mv from to

class LocalPOSIXStoreManager(CoreStoreManager):
  """ ... another manager """
  def move(from, to):
    #... os.move(from, to)

# Running on the client
class TransferManager:
  """ Prototype for transfer management """
  def transfer(local_path, remote_path=None):
    """ Remote path can be pre-staged """
    #...

class RsyncTransferManager(TransferManager):
  """ Example of a transfer manager using rysnc """
  def __init__(self, host_name, ...):
    self.host_name = host_name

  def transfer(local_path, remote_path=None):
     #... rsync loac_path self.host_name:remote_path

class GlobusTransferManager(TransferManager):
  """ Example of a transfer manager using globus """
  def __init__(self, destination_endpoint_id, ...);
    self.destination_endpoint_id = destination_endpoint_id

  def transfer(local_path, remote_path=None):
    #... globus_sdk.initiate_transfer(local_path, remote_path)...

This would allow a flow as follows:

# Server
stores = StoreMetadata(...)

best_store: CoreStoreManager = stores.find_store_with_available(bytes=...)
staging_location: str = best_store.stage(bytes=...)

best_store.post_transfer_information(staging_location, ...)
best_store.process_staged_file(...)

# Client
client = LibrarianClient(...)

transfer_manager: TransferManager = client.get_store_for(bytes=...)

transfer_manager.transfer(local_path)

In this way the transfer (e.g. globus, rsync, cp for local transfers, etc) is completely transparent.

JBorrow commented 11 months ago

Playing around with some ideas in https://github.com/JBorrow/librarian/tree/local-storage

JBorrow commented 11 months ago

An additional request that we have is to add timeouts for various librarian services so they do not hang.

plaplant commented 10 months ago

@JBorrow thanks for pushing on this! Yes, I think you're right about the assumptions/assertions that the Librarian makes. It is true that the Librarian assumes a POSIX filesystem, and many of the data-movement commands are essentially hard-coded to assume a Linux-like environment. (For example, figuring out how much storage is available is done over SSH and assumes df behaves as the coreutils version in terms of options.)

I think that your proposed refactor is a good one, especially given that SO:UK sounds like it is not a POSIX filesystem and will need site-specific handling of file routing and transfer to/from that site.

I think it might be most efficient to chat offline about how best to implement these changes. Thanks again for your help!

JBorrow commented 10 months ago

@plaplant Excellent, maybe it makes sense to meet some time early in the new year? I have been making lots of changes to the librarian in that linked branch... Maybe too many... Hopefully by the time we meet I will have a working full prototype. When I started changing the underlying ssh-based architechture, it made sense to make a few other changes, too.

JBorrow commented 10 months ago

Completed and ongoing tasks:

JBorrow commented 10 months ago

I'm checking out on this project until the new year, but when I get back I will start by integration testing a bunch of stuff (primarily shares between librarians).

JBorrow commented 9 months ago

Development is coming along. I noticed that unfortunately the previous librarian server had close to zero test coverage, so that's a lot more work from scratch than I had anticipated. I have quite a nice test framework now where we spin up real test servers using pytest fixtures for testing the client, and thankfully the new FastAPI based endpoints are easy to unit test.

JBorrow commented 9 months ago

Coverage is coming up:

platform darwin -- Python 3.11.5, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/borrow-adm/Documents/SimonsObs/librarian
configfile: pyproject.toml
testpaths: tests/integration_test, tests/server_unit_test
plugins: xprocess-0.23.0, cov-4.1.0, testmon-2.1.0, anyio-3.7.1
collected 24 items

tests/integration_test/test_configuration.py ...                         [ 12%]
tests/integration_test/test_upload_file.py ..                            [ 20%]
tests/server_unit_test/test_clone.py ..........                          [ 62%]
tests/server_unit_test/test_ping.py .                                    [ 66%]
tests/server_unit_test/test_upload.py ........                           [100%]
======================= server unit test temporary files =======================

Database: /private/var/folders/k1/fld2m5s54fd9nd8qy_wdtvf00000gp/T/pytest-of-borrow-adm/pytest-188/server_12810/database_1281.sqlite
======================= integration test temporary files =======================
Server log: /Users/borrow-adm/Documents/SimonsObs/librarian/.pytest_cache/d/.xprocess/server/xprocess.log
Database: /private/var/folders/k1/fld2m5s54fd9nd8qy_wdtvf00000gp/T/pytest-of-borrow-adm/pytest-188/server_154190/database_15419.sqlite

---------- coverage: platform darwin, python 3.11.5-final-0 ----------
Name                                      Stmts   Miss  Cover
-------------------------------------------------------------
hera_librarian/__init__.py                  165     80    52%
hera_librarian/base_store.py                174    174     0%
hera_librarian/cli.py                       469    469     0%
hera_librarian/models/__init__.py             0      0   100%
hera_librarian/models/clone.py               82      0   100%
hera_librarian/models/ping.py                 9      0   100%
hera_librarian/models/stores.py               3      3     0%
hera_librarian/models/uploads.py             61      0   100%
hera_librarian/tests/__init__.py              9      9     0%
hera_librarian/tests/test_base_store.py     108    108     0%
hera_librarian/tests/test_cli.py             82     82     0%
hera_librarian/tests/test_utils.py           86     86     0%
hera_librarian/transfers/__init__.py          6      0   100%
hera_librarian/transfers/core.py              7      2    71%
hera_librarian/transfers/local.py            14      0   100%
hera_librarian/utils.py                     184    147    20%
librarian_background/__init__.py             15     15     0%
librarian_background/bad.py                   5      5     0%
librarian_background/check_integrity.py      43     43     0%
librarian_background/core.py                 17     17     0%
librarian_background/create_clone.py         86     86     0%
librarian_background/poll.py                  6      6     0%
librarian_background/recieve_clone.py        44     44     0%
librarian_background/send_clone.py           91     91     0%
librarian_background/task.py                 13     13     0%
librarian_server/__init__.py                 13      0   100%
librarian_server/api/__init__.py              3      0   100%
librarian_server/api/clone.py               108      3    97%
librarian_server/api/ping.py                 11      0   100%
librarian_server/api/upload.py               87      7    92%
librarian_server/api/util.py                 32     32     0%
librarian_server/database.py                 13      0   100%
librarian_server/deletion.py                 14      1    93%
librarian_server/logger.py                    5      0   100%
librarian_server/orm/__init__.py              5      0   100%
librarian_server/orm/file.py                 31      1    97%
librarian_server/orm/instance.py             49      1    98%
librarian_server/orm/librarian.py            34     11    68%
librarian_server/orm/storemetadata.py       100     13    87%
librarian_server/orm/transfer.py            132     23    83%
librarian_server/search.py                  647    647     0%
librarian_server/settings.py                 35      1    97%
librarian_server/store.py                   297    297     0%
librarian_server/stores/__init__.py           6      0   100%
librarian_server/stores/core.py              36      9    75%
librarian_server/stores/local.py             70     12    83%
librarian_server/stores/pathinfo.py          13      0   100%
librarian_server/webutil.py                 207    181    13%
-------------------------------------------------------------
TOTAL                                      3727   2719    27%

============================== 24 passed in 3.77s ==============================
JBorrow commented 9 months ago

Coverage only appears so low because of a lack of tests for old deprecated code. The new api endpoints are close to 100% coverage, and the new client also has very high coverage. The background tests will be getting implemented soon.

JBorrow commented 9 months ago

I am closing this issue as we work towards the Deployment milestone.