Closed ultimate-tester closed 7 years ago
Thanks @ultimate-tester. I think that this is a good start, even though it's not streaming the data in the most correct sense. We could improve it later if need be.
EDIT: Removed comment about webdav-client
as I was mistaken. I would want to copy it there however so both libraries have it.. (Apologies for the millions of edits). If you'd like to add it to webdav-client yourself that's fine, otherwise I'll do it in a few hours.
Being WebDAV an extension of HTTP, it makes more sense to use the streams APIs of the Node.js http
module. Also, I agree it would make more sense to implement this on the webdav-client
module itself...
Ok.. @piranna raises some good points. @ultimate-tester I think this work would be best completed in webdav-client as mentioned earlier. It should then be used here. This means that when we update the method of creating streams, we do it once and both libs benefit (as is the intention).
Very good points, didn't think about that. We can create a stream to a HTTP resource as long as we add the required webdav headers (authentication and such), then there is no need for emulating streams as it will be a native stream already. I think both modules should be edited. The webdav-client (I see it as low level webdav api) should contain a method to create a stream to the resource, the webdav-fs should contain a method to provide the FS createReadStream method. If you want help just let me know and I'll make some changes in both projects.
That was just whay I was asking for :-)
I just had another look at the source and I'd rather rewrite this module to use request instead of node-fetch. Mainly because request supports streams out of the box, but it also looks cleaner and provides more options than node-fetch. I already made the switch for a few methods and it seems to work just fine, but it can't use promises anymore and will use callbacks (which I prefer more anyways). This means that the change will break compatibility for depending modules or usages in projects.
At this moment there is only one module depending on webdav-client, which is webdav-fs and I'll make sure that it won't break, I would provide a pull request for that one also.
If you'd give me a green light, I'll make the switch to request and implement callbacks rather than promises and expose a streaming method. Also, perhaps I could have a look at all open issues so far @ webdav-client, I think some of them are an easy fix with request.
Actually node-fetch supports streams out of the box, and I've already had a crack at implementing the readstream functionality using it, which seems to work fine in testing. Also, request
is not an option for this library as it's extremely heavy and transpiles (for browser use) very poorly. node-fetch on the other hand transpiles very well (especially seeing as it mirrors the browser API where possible).
What I've got in webdav-client looks something like this:
function getFileStream(url, filePath, options) {
options = options || { headers: {} };
return fetch(url + filePath, {
method: "GET",
headers: options.headers
})
.then(responseHandlers.handleResponseCode)
.then(function(res) {
return res.body;
});
}
Notice return res.body;
, which returns a Readable
stream.
Let me put together a WIP PR and see how that checks out..
Ah okay didn't know that (don't know node-fetch at all). I wonder though, will giving the start and end options (those are Readable options) translate into a HTTP Range request so it will actually fetch that specific range? I think that it will fully download the file, then give back the part that the user specified (meaning that when you have a 2GB file for instance this will take some time). I hope to be proven wrong on this because this ranged request is actually what I need :)
Not yet finished sorry, let me make the pr shortly 😊
I wonder though, will giving the start and end options (those are Readable options) translate into a HTTP Range request so it will actually fetch that specific range?
Probably we'll need to craft the header ourselves, but yes, it should work if the WebDAV server support ranges, if not it will return the full file with a 200 error code instead of a 204.
By the way, @ultimate-tester you should take a look at the got package... nobody is using request
anymore ;-)
Ok @ultimate-tester @piranna I've had a crack at implementing the stream support in the client. Wish I could have used a more up to date structure but I'm still supporting Node v4.
https://github.com/perry-mitchell/webdav-client/pull/30
I'd be very interested to hear your thoughts before continuing. If this goes through I'll also merge this PR and update the contents to use the method in the client.
~EDIT: Haven't added the range support yet..~ Added.. It works at least with the test server.
At first I was a bit upset, until I understand that fetch
is returned a promised stream, now it makes sense the usage of the PassThrought instead of directly returning the (Node.js http
) stream. Yes, this is in the path what I was asking for :-) Don't know if webdav-client
should have createReadStream()
instead of implement it only at webdav-fs
, but as a helper function is useful.
By the way, upload files with PUT
method allow ranges using the Content-Range
header.
At first I was a bit upset, until I understand that fetch is returned a promised stream, now it makes sense the usage of the PassThrought instead of directly returning the (Node.js http) stream
Yeah.. unfortunately it was easiest to return a Promise.<Stream>
as we have to wait for the connection to be formed before we start receiving.
Don't know if webdav-client should have createReadStream() instead of implement it only at webdav-fs, but as a helper function is useful.
I thought the same, but then again webdav-client was supposed to still be full featured. Meaning that if I wanted to stream directly without createReadStream
I'd have to await some promises.. which is not always a great option. Having createReadStream
means that I can just pipe directly into a file for instance.
By the way, upload files with PUT method allow ranges using the Content-Range header.
Very good info, thank you! I probably won't consider merging the client part until I have writing also sorted. Just on that part, I did find the following which is somewhat concerning:
An origin server that allows PUT on a given target resource MUST send a 400 (Bad Request) response to a PUT request that contains a Content-Range header field (Section 4.2 of [RFC7233]), since the payload is likely to be partial content that has been mistakenly PUT as a full representation. Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in [RFC5789]).
It seems like PATCH is the recommended approach, but this is a new area for me in terms of knowing server support. From that SO answer it seems that SabreDAV supports PATCH with partial content, which means that theoretically ownCloud/Nextcloud style services should support it.. But I'm not sure of others. I guess seeing as this is the first implementation we can cater for just those and see how others handle.. or I could add an option to allow for both.
@ultimate-tester Regarding your earlier concern:
I think that it will fully download the file, then give back the part that the user specified (meaning that when you have a 2GB file for instance this will take some time)
Reading more into this it definitely looks like the range header is the way to go:
The presence of a Range header in an unconditional GET modifies what is returned if the GET is otherwise successful. In other words, the response carries a status code of 206 (Partial Content) instead of 200 (OK).
If a proxy that supports ranges receives a Range request, forwards the request to an inbound server, and receives an entire entity in reply, it SHOULD only return the requested range to its client. It SHOULD store the entire received response in its cache if that is consistent with its cache allocation policies.
from this spec
So providing the server responds with 206 Partial Content, I guess most servers should only be sending part of the file (as body would be the same size as the request. From this I made the following issue to update my PR: perry-mitchell/webdav-client#31
Yeah.. unfortunately it was easiest to return a Promise.
as we have to wait for the connection to be formed before we start receiving.
Readable objects has the readable
event to know when there's (new) data available...
I thought the same, but then again webdav-client was supposed to still be full featured. Meaning that if I wanted to stream directly without createReadStream I'd have to await some promises.. which is not always a great option. Having createReadStream means that I can just pipe directly into a file for instance.
As I've said is a helper function, I would remove it but doesn't do any hurt, at least doesn't add new dependencies because is using build-it PassTrough :-)
It seems like PATCH is the recommended approach
There's a bit of debate about this and no actual clear conclussion, but definitely you pointed a good reference, I'll need to review it... Maybe checking with OPTIONS if PATCH method is available and use PUT with Content-Range if not?
So providing the server responds with 206 Partial Content, I guess most servers should only be sending part of the file (as body would be the same size as the request. From this I made the following issue to update my PR: perry-mitchell/webdav-client#31
When a server doesn't support partial GET, it will return 200 and the full file, so it would need to be sliced (and cached?) client side, not throw an error because it's a valid response.
Ok, I've read the links that you posted and yes, seems now it's explicitly forbidden to do a PUT with a Content-Range... that's a shame because it was the most simple alternative and also doesn't probide any other one :-/ PATCH is an option, but it's application dependent. Maybe we could implement "something" (like PUT + Content-Range) and later update or fix it when we find other alternatives or something truly standard?
After reading this one: https://blog.sphere.chronosempire.org.uk/2012/11/21/webdav-and-the-http-patch-nightmare
I think we should do a HEAD request before EVERY GET, PUT or PATCH first, check if the server sends back the Accept-Ranges header and if it accepts ranges we should do that GET, PUT or PATCh request with ranges. By spec, a server is not allowed to disgard any Content-* header and MUST return with 501 header causing our request to fail.
By spec, a server is not allowed to disgard any Content-*
@ultimate-tester This is contrary to what I've read unfortunately, and that's enough for me not to trust any given group of servers 😔
I think we should do a HEAD request before EVERY GET, PUT or PATCH first, check if the server sends back the Accept-Ranges header and if it accepts ranges
This feels like massive overkill to me, but I guess it might be the only way. My only concern is testing this in real world environments and with real servers. I think we can start with just implementing ranges and we could add the detection and handling later.
Ok, I've read the links that you posted and yes, seems now it's explicitly forbidden to do a PUT with a Content-Range...
@piranna That being said, if PUT is also accepted perhaps it should be provided as an option. The consumer can at least override the METHOD so perhaps that is enough.
All in all I think this will be incremental. I'm steering towards supporting read+write streams as a minimum (almost done), then incrementally adding sending PATCH + ranged content, sending PUT + ranged content (option) and then eventually sending OPTION requests to detect features. The last one I'm very skeptical of because many home grown servers (non-nginx/apache) probably don't support OPTION requests (at least not in my experience). WebDAV servers may be different. Again, I would want a good range of test cases for what OPTION responses I'd see, but let's see how that goes.
Regarding the read of partial content, I think I'll fail for now if the length is wrong. It's a lot of work to implement the cutting + trimming of a stream in case it's too big.
After this we'll at least have ranged read+write support for streams, which is really cool. Slowly but surely.
I think we can start with just implementing ranges and we could add the detection and handling later.
+1, GET ranges should be straightforward, it's standard and just a matter of checking for the response status 200 or 206.
The conflictive one are PUT ranges, that for some time were mostly unspecified and now they are forbidden without an easy to implement spec, so I'll go for it using Content-Range because it's the easiest one and later we'll try to fix it for other servers in a case by case basis until a standard arises (if any), because it's no clear how they should behave at all.
Agreed.. I guess the sending of ranges can be quite custom with PUT/PATCH support.. that's not too hard. I would still think failing for GET is the best way if the response is not 206 when requesting a range. This can be handled later.
I'll have an update tomorrow of the PR to include writing most likely..
Or if someone else wanted to have a go at writing the writestream code in WebDAV-client they can do so of course 😁.. I just got a bit carried away.
I may merge the read support first so we can merge this one too.. perhaps tomorrow. Write can come later.. 🙂
Regarding the read of partial content, I think I'll fail for now if the length is wrong. It's a lot of work to implement the cutting + trimming of a stream in case it's too big.
Just add a counter... I have done it in the past, I can implement it when we finish this.
After this we'll at least have ranged read+write support for streams, which is really cool. Slowly but surely.
Well, all the features we are talking about here are really cool, we can craft a nice module here! :-D
Just add a counter... I have done it in the past, I can implement it when we finish this.
What do you mean? If we request say a chunk of 256 bytes, and an entire file is returned of say 1024 bytes, how would you trim if it's in the middle of the stream? You'd have to implement reading of the read for a partial range and then write that out.
It's a bit cumbersome but I know it needs to be done at some stage. If you're happy checking that out @piranna that'd be really awesome.
Merging this so we can get readstream support out :)
What do you mean? If we request say a chunk of 256 bytes, and an entire file is returned of say 1024 bytes, how would you trim if it's in the middle of the stream? You'd have to implement reading of the read for a partial range and then write that out.
First rule when using Node.js: find a module on npm that does what you want :-D Or if you want several concatenated slices... :-P
First rule when using Node.js: find a module on npm that does what you want
For node i'd agree, but not for JS in general. With both of these projects I've tried to retain a careful balance between not reinventing the wheel and size when transpiling using webpack. I want a small footprint for these packages and libraries like range-stream
(and through2
therein) have several dependencies which have to also be transpiled and bundled for web projects. If I can do it manually in a small enough amount of code I'd prefer to do it that way. But let's see :)
I don't care really, but I like to follow you (perry-mitchell) on the no-dependency train because I always hate dependencies that mess with my goal in some way.
The only real thing I care about (which is why I created the pull request in the first place) is that the stream actually is capable of doing a range request. It's because I am streaming a large file (~2GB) from the webdav and I only need roughly 4MB from this whole file (thus I want to be able to specify a range which will result in a ranged request). Therefore my pull request, I simply provided a ranged request and pushed it to a new stream before returning. If you want, you can push your stuff to make it into a stream and then I'll add the stuff required to make ranged request (as long as either the start OR end option is provided).
I want a small footprint for these packages and libraries like range-stream (and through2 therein) have several dependencies which have to also be transpiled and bundled for web projects.
Fair enought :-) On the other side, range-stream
seems to don't be updated from two years ago and don't have a repository, so it seems to be a good time to "update" it ;-) I'll take a look and give you some comments.
I'll take a look and give you some comments.
I really appreciate the help and input. Let me know if you happen to find a better option there - I'm definitely not anti-packages, just a bit picky :smile:
I'll take a look and give you some comments. I really appreciate the help and input. Let me know if you happen to find a better option there - I'm definitely not anti-packages, just a bit picky
https://www.npmjs.com/package/@piranna/range-stream
Et voila! :-D No dependencies! :-P
The update is now in master, and I'll release it a bit later today.
Wiii!!! ^^
This will resolve #23 partially