jkunze / bagitspec

31 stars 11 forks source link

What is the semantics of fetch.txt items that already exist? #7

Open stain opened 9 years ago

stain commented 9 years ago

Is it currently undefined what is the meaning for files listed in fetch.txt that already exist in the bag.

How should a consumer of such a bag interpret this?

a) The existing file came from that URL (but may no longer be available) b) The existing file should also be available at that URL (and can thus be removed from the payload directory) c) The existing file should be replaced with a download from that URL d) None of the above

My suggestion is b).

Ardvaark commented 9 years ago

It seems to me that the consumer is free to implement the fetch.txt usage however they wish, and thus the spec ought to say nothing about this. (For what it's worth, the original purpose was to facilitate a bag being transferred between parties, and thus (b) would seem the least appropriate option.)

stain commented 9 years ago

That would mean for me that fetch.txt can't be relied on for much at all when there are massive semantic gaps like this. It means 4 different consumers of a bag can have 4 different interpretations and varying bag states - which seems to go against the whole purpose of BagIt to provide reliable transfer of content.

stain commented 6 years ago

Well the semantics of fetch.txt need to be further defined if BagIt is to be useful also for archiving - it is named and described like an imperative command action ("fetch!"), which sounds like option c) above.

But currently a consumer don't know if the action has been done, should be done, may be done or should not be done.

acdha commented 6 years ago

Those questions are out of scope the way I've been thinking about the spec: BagIt defines the correct end state but doesn't care how you get there as long as you have a file with the stated name and hash value(s) specified in the manifest(s).

In many cases, I think the answer to your question would be “The client hashes the file. If it's valid, it does nothing. If it doesn't match, it replaces the existing file with the remote copy” but it's easy to imagine scenarios where a different action would be appropriate based on local policy: reporting problems for analysis, retrieving the file from a separate local archive or preferred third-party mirror which is preferable for some reason, etc.

stain commented 6 years ago

I guess it's a more general problem of what to do if validation fails or a bag is in inconsistent state; e.g. should the bag be re-transferred, fetch.txt executed again, manifest checksums updated and fetch.txt line removed (local edits), or some other alternative. So I agree that handling in general would be out of band for spec.

But I still think then it should say in the spec, "It is undefined by this specification how an implementation should handle fetch.txt lines which files already exists in the payload directory" - hinting they should at least think about it.

Ardvaark commented 6 years ago

Wasn't the fetch.txt portion of the spec removed entirely from the most recent drafts?

acdha commented 6 years ago

@Ardvaark I was just writing about that in https://github.com/jkunze/bagitspec/issues/10#issuecomment-377314894. Basically I think fetch.txt adds a fair amount of complexity to the spec and implementations even without the various extensions people reasonably want, and I'd question whether it's worth making the BagIt community support a relatively general task rather than reusing other RFCs, especially since a fair percentage of users never use it.

We removed it in but restored it in https://github.com/jkunze/bagitspec/pull/19/commits/9a2787c5ff955ee8ccff55d279028f09ea2b90d4 to make it easier to upgrade existing bags to 1.0.

My proposal for 2.0 would basically be deprecating it in favor of suggesting an alternative format with a well-known filename and move the existing format documentation and security advice to an appendix roughly along the lines of “This was in use in the community and is not a requirement but included for anyone who needs to interoperate with or convert those bags”. I should note that this is my personal opinion, not consensus.