syndicate-storage / syndicate

Internet-scale software-defined storage system
Apache License 2.0
56 stars 10 forks source link

File & Block versions on Extended Attributes of file #39

Closed iychoi closed 10 years ago

iychoi commented 11 years ago

Exposing File & Block versions via Extended Attribute will be very helpful to UGs.

UG can check whether the file is modified or not by reading Extended Attribute which is very simple.

iychoi commented 11 years ago

If a file is very huge, the file will be consisted of many blocks. In this case, the file will have too many block versions on Extended Attributes.

For me, only file version will be enough.

jcnelson commented 11 years ago

I was going to give you a SHA256 hash of the block versions in order, along with the file hash. Then you can check for any modification :)

-Jude

Illyoung Choi notifications@github.com wrote:

If a file is very huge, the file will be consisted of many blocks. In this case, the file will have too many block versions on Extended Attributes.

For me, only file version will be enough.

— Reply to this email directly or view it on GitHub.

iychoi commented 11 years ago

That's better :-)

jcnelson commented 11 years ago

Out of curiosity, is there a reason why you don't want to check the file modification timestamp to see if the file was changed?

-Jude

----- Original Message ----- From: "Illyoung Choi" notifications@github.com To: "jcnelson/syndicate" syndicate@noreply.github.com Cc: "Jude Nelson" jcnelson@CS.Princeton.EDU Sent: Friday, September 6, 2013 2:11:41 AM Subject: Re: [syndicate] File & Block versions on Extended Attributes of file (#39)

That's better :-)

— Reply to this email directly or view it on GitHub .

iychoi commented 11 years ago

Can I trust it? I thought it doesn't guarantee that the content is the same. Because file version is managed by Syndicate, I thought using file version will be accurat.

jcnelson commented 11 years ago

You can trust it in the same way you trust the modification time on a typical filesystem. The modification time gets updated on write(), truncate(), and utime(), so it's possible for a remote user to change the modtime without changing the data, or changing the modtime such that a write can go undetected.

In Syndicate, the file version is set when the file is created, and doesn't change until the file gets deleted. The block versions change on write() and truncate(). Therefore, the hash of all versioning information will be guaranteed to change on write, regardless of modification time. The only reason why this might not be desirable is that it requires making sure the file manifest is up-to-date (i.e. checking this hash can incur a network RTT if the manifest is stale). Is that okay?

-Jude

----- Original Message ----- From: "Illyoung Choi" notifications@github.com To: "jcnelson/syndicate" syndicate@noreply.github.com Cc: "Jude Nelson" jcnelson@CS.Princeton.EDU Sent: Friday, September 6, 2013 6:25:22 PM Subject: Re: [syndicate] File & Block versions on Extended Attributes of file (#39)

Can I trust it? I thought it doesn't guarantee that the content is the same. Because file version is managed by Syndicate, I thought using file version will be accurat.

— Reply to this email directly or view it on GitHub .

iychoi commented 11 years ago

Hm... I didn't know that the file version will not be changed in writing. The file manifest also contains modification time isn't it? If yes, even though I use modification time, I should update the manifest as well. If not, using modification time will be enough.

jcnelson commented 11 years ago

That's why a hash that includes the block versions would be appropriate, since the hash would be guaranteed to change on write. The difficulty I alluded to earlier is that to get the block versions to compute the hash, Syndicate must download the manifest. This isn't a problem if you can tolerate the time cost of a periodic network RTT to download/refresh the manifest. The hash would be computed every time you accessed a particular extended attribute (via the getxattr() syscall).

-Jude

Illyoung Choi notifications@github.com wrote:

Hm... I didn't know that the file version will not be changed in writing. The file manifest also contains modification time isn't it? If yes, even though I use modification time, I should update the manifest as well. If not, using modification time will be enough.

— Reply to this email directly or view it on GitHub.

iychoi commented 11 years ago

I thought hash of the block versions will be pre-computed. Then, just let it be. Computing hash for block versions will not take long time. But downloading many block versions will take some time.

I'll simply use modification time instead. :-)