Closed made1990 closed 4 months ago
no such feature yet, but i guess i'll make a plugin soon. to design its features, i'd need you to think if you just need API or also GUI. What do you use these checksums for?
for the api part, I made some research, and I could add this header in for PUT, POST, GET, HEAD Digest: md5=...
one could use HEAD to get the md5 without downloading. Does it sound good?
for the api part, I made some research, and I could add this header in for PUT, POST, GET, HEAD Digest: md5=...
one could use HEAD to get the md5 without downloading. Does it sound good?
That sounds perfect. I only need it via API (PUT, GET), not for GUI I would need it to verify if uploaded files are 100% identical against original file.
would you say you need md5 only for the files you upload?
because i'm realizing that to always provide md5 for files for which it was not calculated before seems needlessly heavy.
I would also offer it in case you append ?get=md5
to any file's url.
would you say you need md5 only for the files you upload? because i'm realizing that to always provide md5 for files for which it was not calculated before seems needlessly heavy. I would also offer it in case you append
?get=md5
to any file's url.
Both cases would be great, but md5 only for newly uploaded files would be enough if is easier.
are you willing to use the value just as the upload finishes, or also later?
are you willing to use the value just as the upload finishes, or also later?
As the upload finishes is enough. I guess otherwise it will be some overhead to save the information somewhere , am i right?
some overhead, yes, but not necessarily a problem. i already programmed it but i'm still undecided how to "bundle" it. It would help to have more insight on the need. I'm surprised you are interested in detecting a corruption during an http upload, as for what i know, it's extremely unlikely to happen, and HFS is not subject to interrupted uploads since it sets the final filename only after the end. What do you think about it?
I agree that hfs and the http protocol itself bring some functionallity to prevent faulty uploads or file corruption. Still network can be interrupted or similiar. I am using HFS in a semi-professional environment and my users ask for a way to ensure integrity of the files that are uploaded so the idea came up if the md5 can be returned after upload finished to compare with original md5.
i'm willing to offer the functionality, but as i told you, an interrupted upload in HFS will have the word $upload in the name, so you cannot be mistaken
while i still have to decide how to introduce md5 in HFS, i made it possible for a simple script to do it. The script uses new things that i'm about to publish, to read the incoming stream, so that you don't need to re-read the file from the disk after the upload is finished, especially good if the file is big. I also wrote another script that does the re-reading instead, and published all in the documentation, as an example https://github.com/rejetto/hfs/wiki/Middlewares#calculate-md5-on-uploads
If you are willing to test it, I can give you a preview version, but I need to know if you will run hfs with npx or what operating system.
Sounds good, a test version would be great. I am running the npm version on Windows (it runs as a service on windows)
i decided to publish the version in the meantime. the version you need is 0.53.0-alpha5. with npm or npx you need to specify hfs@beta instead of just hfs, to get it.
so, you used these instructions to set up your service?
so, you used these instructions to set up your service?
Correct
i'd like to find an "npx" way of making a service on windows, similarly to linux, so to make update easier (just by restarting the service). and don't forget to give me a feedback on the md5.
Yeah, the update process for the windows service version of HFS is a bit inconvienient, but still doable.
Sure, will do testing of the md5 thing next week when I am back at the system :)
Just to make sure, the code you documented under https://github.com/rejetto/hfs/wiki/Middlewares#calculate-md5-on-uploads needs to be added to the server code
part of Options
in the Admin gut, right? And that should do the trick?
Correct
Do I need to add something to my PUT command to get the md5 in return?
nope
Hm. It simply givs me an empty bracket es return: {}
My command is:
curl -X PUT https://my-url.com/myfolder/file1.txt
-H "Authorization: Basic XXXXX* -d "Content of file"
you are looking at the body, while the md5 is in a header
When I use Method 1: calculate by reading file after it has been written
the file is written correctly, but an error is returned:
curl: (56) Failure when receiving data from the peer
When I use Method 2: processing incoming stream
then the file is not even written correctly. It remains in the status with the hfs$upload
prefix.
what hfs version are you using?
0.53.0_alpha5
ok let me check
i just tried with your command, and got this using alpha5 and method 1. i'm not sure what's different on your side.
do you get the same error WITHOUT the server code?
Without the server code , everything is working normally.
Empty reply from server
I take we are doing all these tests with "method 1". It may be confusing to mix results.
Please tell me about the system you are running on, what Windows version, what about the drive, are you in a virtual environement, anything peculiar you can think of.
I'm going to make a test on a Windows machine now.
My test of method1 on Windows 11 was successful. The file was written, and I got the 200 reply with "{}" in the body and the X-MD5 header. I'm not sure if to be glad or sad.
You can run hfs with "--dev" parameter. That will add a lot of more info in console. See if there's anything printed with the request. It's worth a shot.
And... I'd rather do it myself but I don't have access to your server. What I'd do is to gradually remove lines from the middleware block until the problem disappears, and then I'd know that the last line I removed is related to the problem. So first I would remove these lines, and test
return new Promise(res => {
f.once('end', () => {
ctx.set({ 'X-MD5': hasher.digest('hex') })
res()
})
})
And then remove this, and test again.
const hasher = createHash('md5')
f = createReadStream(f)
f.on('data', x => hasher.update(x))
I expect one of these blocks to be the problem.
Yes - I am trying method 1
Its an NTFS Filesystem on Windows Server 2016. Its a physical server, but in fact its a virtual filesystem. An application is running on Windows which virtualizes a NTFS filesystem (CBFS) that HFS is writing to.
If I remove the last line of the code, its already solving the issue, but of course then md5 is not returned.
return new Promise(res => {
f.once('end', () => {
ctx.set({ 'X-MD5': hasher.digest('hex') })
res()
})
})
It is strange still ,because the file is successfully written, i can see it on the filesystem and can open it.
The code you removed is not needed for the upload, just for the md5, so it's not strange that once the problem is removed the upload still works. Your feedback was helpful anyway.
your cbfs is not supporting ntfs' "alternate streams" feature. that's preventing it to save the information about who uploaded the file, no big deal.
it is possible that your cbfs is doing something funny with the md5 code too, as it may explain the difference between my Windows and yours. So that I try to read the file and i fail, for some reason. I guess that we are getting an error, but that's not handled by the code above. See what happens with this variation
return new Promise((resolve, reject) => {
f.once('end', () => {
ctx.set({ 'X-MD5': hasher.digest('hex') })
resolve()
}).on('error', reject)
})
here i'm both printing the error and ensuring to continue serving the request. In case of error you won't get md5, but the request will work AND we can see try to better understand the error.
return new Promise((resolve, reject) => { f.once('end', () => { ctx.set({ 'X-MD5': hasher.digest('hex') }) resolve() }).on('error', reject) })
HTTP Code 200 returned and file written successfully, but without md5 return. Console output: error middleware plugin ENOENT: no such file or directory
thanks for your feedback! ok, i think i've got what's going on here. timings are different and while on my system the file has already its final name, it's still with temporary name on yours. i will now see how to solve this.
ok, it's not a problem in the script.
it's a bug in HFS, calling the middleware too early, but only in some occasions.
I just made the fix, and it would be wonderful if you could confirm that it's effective for you, before i publish it.
I made my tests both on mac and windows.
this is the binary 0.53-alpha5.5 hfs-windows.zip
or if you are running with npm/npx, you need to npm -g update hfs@exp
i changed my mind and published. It's alpha6 and you get it as hfs@beta
https://github.com/rejetto/hfs/releases/tag/v0.53.0-alpha6
it's actually the same as 5.5, just renamed.
Still, your feedback is welcome.
I can confirm, with alpha-6 the code for method 1 is working :) File written correctly, md5 returned correctly.
I'll do some further testing tomorrow (different file sizes, etc. )
Thx for the great work so far. much appreciated.
cool! i'm glad we have a better tool now
Is there some file size limitation when uploading via API ? I uploaded a file with 500MB but it is cut after 250MB and then of course returns the wrong md5. If uploading via GUI the file is uploaded completely. Does not matter if with or without the middleware code.
Dont tried with stable HFS version, just tried alpha-6 now
There's no known limit. All people reporting same problem eventually solved by removing the limit on their reverse proxy.
hm, there is no proxy inbetween. Its the same subnet Funny..when uploading a 4GB file its also cut almost at the half, finished the upload after around 2GB. HTTP return code 200
You didn't say much about what client you are using
simple curl
then i'm going to upload a 500+MB file with curl and see what happens
just uploaded 1gb, completely written and md5. version alpha6. then i made same test on a remote (not localhost) server over https, with credentials. Completed again.
I don't know what's different on your side. Ensure you use curl like curl -T file url/
Consider providing a video of what you are doing, because I may see a clue you are not telling.
also, consider that uploading via API is not really an alternative way, it's the only way. What the frontend does in Chrome is to call the same API that you are calling, and you said it is working fine in that case. You can see the api being called pressing F12 and then using the "network" tab. Just to clarify things.
oh wow with -T
option of curl it works. File uploaded completely, md5 returned. Takes slightly longer than without the code but that makes sense of course
Before I usede -d @file
to upload the file and it seems that is a different behaviour
Is it possible to get the md5 checksum of a file? Either directly after uploading (using PUT) as a return value or as part of an http GET using the api.