quiltdata / quilt

Quilt is a data mesh for connecting people with actionable data
https://quiltdata.com
Apache License 2.0
1.33k stars 90 forks source link

docs for URI and CODE #4204

Closed drernie closed 2 weeks ago

drernie commented 1 month ago

Documentation for

Propose new '&catalog` fragment.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 37.68%. Comparing base (aa05544) to head (c1ec361). Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #4204 +/- ## ======================================= Coverage 37.68% 37.68% ======================================= Files 768 768 Lines 35321 35321 Branches 5214 5214 ======================================= Hits 13312 13312 Misses 20775 20775 Partials 1234 1234 ``` | [Flag](https://app.codecov.io/gh/quiltdata/quilt/pull/4204/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=quiltdata) | Coverage Δ | | |---|---|---| | [api-python](https://app.codecov.io/gh/quiltdata/quilt/pull/4204/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=quiltdata) | `91.09% <ø> (ø)` | | | [catalog](https://app.codecov.io/gh/quiltdata/quilt/pull/4204/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=quiltdata) | `12.07% <ø> (ø)` | | | [lambda](https://app.codecov.io/gh/quiltdata/quilt/pull/4204/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=quiltdata) | `87.95% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=quiltdata#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

drernie commented 1 month ago

@fiskus @sir-sigurd Any thoughts? I'm sure @nl0 will want to weigh in, but I'd appreciate your feedback before then.

drernie commented 4 weeks ago

Would it help to make the catalog part of the query string instead of the fragment?

quilt+s3://bkt?catalog=dns.name#package=pre/suf

On Mon, Oct 28, 2024 at 10:23 Maksim Chervonnyi @.***> wrote:

@.**** commented on this pull request.

In docs/Catalog/URI.md https://github.com/quiltdata/quilt/pull/4204#discussion_r1819460090:

+- quilt+: The scheme of the URI. This is always quilt+. +- s3://: The protocol of the URI. This is currently s3://. +- <bucket>: The name of the bucket containing the package or object, e.g.

  • quilt-example. +- #package=<package>: A fragment for the name of the package containing the
  • object, e.g. akarve/cord19.
  • +In addition, it may contain the following optional components:

  • +- <package>@<top_hash>: The hash for this specific package, e.g.

  • e21682f00929661879633a5128aaa27cc7bc1e2973d49d4c868a90f9fad9f34b. +- <package>:tag: The tag for this specific package. Currently, only the
  • latest tag is supported. You may not specify both a top_hash and a tag. +- &path=<path>: Fragment for the path to the object within the package, if
  • any, e.g.ß CORD19.ipynb. +- &catalog=<catalog>: Fragment for the DNS name of catalog where this package

Although, I am not against adding the catalog parameter, I'm describing my doubts. It looks like, package, top_hash and path describe WHERE data is located. And the data is expected to be there no matter what, until S3 is online, and the user pays for it. On the other hand, catalog describes HOW to get that data. And lifespan of the catalog is different from S3. And it opens a question, should this type of information belong to URI.

  • Basic HTTP auth, for example, is not a part of URI, it transmitted with HTTP Header
  • But user/password for FTP is part of URI @.***)
  • And, user/password for SSH too @.***)

So, considering this, I think, it's legit. But, maybe we can create separate scheme : for example, @.@. It seems like overengineering, though. And barely readable

— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/4204#pullrequestreview-2399732910, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T6ZBOSOATCZTPA3WI3Z5ZQIFAVCNFSM6AAAAABQUHSKA6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGOJZG4ZTEOJRGA . You are receiving this because you authored the thread.Message ID: @.***>

fiskus commented 4 weeks ago

quilt+s3://bkt?catalog=dns.name#package=pre/suf

I don't have a rational explanation, but that looks more elegant for me. I don't know (or don't remember), what was the rationale behind using # hash instead of ? query string in the first place.

sir-sigurd commented 4 weeks ago

I don't know (or don't remember), what was the rationale behind using # hash instead of ? query string in the first place.

AFAIR that was done because unlike fragment you need to encode querystring / in particular