stellar / go

Stellar's public monorepo of go code
https://stellar.org/developers
Apache License 2.0
1.31k stars 499 forks source link

Add utility command to get file name from ledger number in Galexie #5408

Open urvisavla opened 4 months ago

urvisavla commented 4 months ago

What problem does your feature solve?

I've often needed a way to quickly obtain the file/object name for a specific ledger in the datastore. This typically involves understanding the datastore schema and calculating the sorting key prefix (in hex). To streamline this process, a utility command in Galexie that provides the file name based on a ledger number would be very useful.

Additionally consider:

What would you like to see?

The following commands:

  1. galexie ledger-path <ledger_num> --config-path config.toml which prints the filename, for example:

    Ledger file path: FFFFFFFF--0-63999/FFFFF11C--3811.xdr.zstd
  2. To check if the file exists, we could add another command: galexie find-ledger <ledger_num> --config-file config.toml, which would print:

    Ledger file found: exporter-test/ledgers/testnet/FFFFFFFF--0-63999/FFFFF11C--3811.xdr.zstd

    or

    Ledger file not found for ledger number <ledger_num>

What alternatives are there?

An alternative is for users to manually calculate the filename or use the DataStoreSchema object to programmatically determine it.

urvisavla commented 4 months ago

@jacekn, we think this feature would be quite useful but would like to get your input before adding it. If you agree it's valuable, we'd also appreciate your thoughts on the command names to ensure they have a good ux.

jacekn commented 4 months ago

@jacekn, we think this feature would be quite useful but would like to get your input before adding it. If you agree it's valuable, we'd also appreciate your thoughts on the command names to ensure they have a good ux.

This for sure could be useful! Some ideas I can think of:

  1. It could be useful to allow for ledger range to be specified
  2. When we check for file existence it's good to use standard unix exit codes. Normally you'd exit with 0 to indicate success and nonzero to indicate failure. You can even expose different nonzero exit codes to indicate different errors but in my opinion this is likely an overkill here. Exit code or 0/nonzero this will allow people to use the command in scripts in a standard "unix" way.
  3. Another nice thing for scripting would be a way to print just filenames, one per line. This would allow people to, for example, list files and pipe results to something like tar to back things up without the need to parse output
  4. This isn't a very strong opinion but maybe we can try to find better names. It's not very obvious what the difference is between ledger-path and find-ledger without checking help. I wonder if we could collapse this to one command? :
    
    # Print file path
    $ galexie ledger-path <ledger_num> --config-path config.toml
    Ledger file path: FFFFFFFF--0-63999/FFFFF11C--3811.xdr.zstd

Print file path and check if it exists (exit with nonzero exit code if it doesn't)

$ galexie ledger-path --check --config-path config.toml Ledger file path (missing on disk): FFFFFFFF--0-63999/FFFFF11C--3811.xdr.zstd

Maybe even print full path including bucket name?

$ galexie ledger-path --full --config-path config.toml Ledger file path: gsc://my-bucket/FFFFFFFF--0-63999/FFFFF11C--3811.xdr.zstd


with regards to the point 4 I think it's worth brainstorming this a bit more with others
urvisavla commented 4 months ago
  • It could be useful to allow for ledger range to be specified
  • When we check for file existence it's good to use standard unix exit codes. Normally you'd exit with 0 to indicate success and nonzero to indicate failure. You can even expose different nonzero exit codes to indicate different errors but in my opinion this is likely an overkill here. Exit code or 0/nonzero this will allow people to use the command in scripts in a standard "unix" way.
  • Another nice thing for scripting would be a way to print just filenames, one per line. This would allow people to, for example, list files and pipe results to something like tar to back things up without the need to parse output
  • This isn't a very strong opinion but maybe we can try to find better names. It's not very obvious what the difference is between ledger-path and find-ledger without checking help. I wonder if we could collapse this to one command? :

Great points @jacekn, I completely agree with you on points 2 and 3. I also like the idea of having a single command with an extra arguments to tweak the behavior and I agree the command names could be more creative. I'll bring up points 1 and 4 with the team during planning/sync. Appreciate your input, thanks!