perry-mitchell / webdav-fs

Node fs wrapper for WebDAV
MIT License
94 stars 21 forks source link

Implemented createReadStream partially #45

Closed ultimate-tester closed 7 years ago

ultimate-tester commented 7 years ago

This will resolve #23 partially

perry-mitchell commented 7 years ago

Thanks @ultimate-tester. I think that this is a good start, even though it's not streaming the data in the most correct sense. We could improve it later if need be.

EDIT: Removed comment about webdav-client as I was mistaken. I would want to copy it there however so both libraries have it.. (Apologies for the millions of edits). If you'd like to add it to webdav-client yourself that's fine, otherwise I'll do it in a few hours.

piranna commented 7 years ago

Being WebDAV an extension of HTTP, it makes more sense to use the streams APIs of the Node.js http module. Also, I agree it would make more sense to implement this on the webdav-client module itself...

perry-mitchell commented 7 years ago

Ok.. @piranna raises some good points. @ultimate-tester I think this work would be best completed in webdav-client as mentioned earlier. It should then be used here. This means that when we update the method of creating streams, we do it once and both libs benefit (as is the intention).

ultimate-tester commented 7 years ago

Very good points, didn't think about that. We can create a stream to a HTTP resource as long as we add the required webdav headers (authentication and such), then there is no need for emulating streams as it will be a native stream already. I think both modules should be edited. The webdav-client (I see it as low level webdav api) should contain a method to create a stream to the resource, the webdav-fs should contain a method to provide the FS createReadStream method. If you want help just let me know and I'll make some changes in both projects.

piranna commented 7 years ago

That was just whay I was asking for :-)

ultimate-tester commented 7 years ago

I just had another look at the source and I'd rather rewrite this module to use request instead of node-fetch. Mainly because request supports streams out of the box, but it also looks cleaner and provides more options than node-fetch. I already made the switch for a few methods and it seems to work just fine, but it can't use promises anymore and will use callbacks (which I prefer more anyways). This means that the change will break compatibility for depending modules or usages in projects.

At this moment there is only one module depending on webdav-client, which is webdav-fs and I'll make sure that it won't break, I would provide a pull request for that one also.

If you'd give me a green light, I'll make the switch to request and implement callbacks rather than promises and expose a streaming method. Also, perhaps I could have a look at all open issues so far @ webdav-client, I think some of them are an easy fix with request.

perry-mitchell commented 7 years ago

Actually node-fetch supports streams out of the box, and I've already had a crack at implementing the readstream functionality using it, which seems to work fine in testing. Also, request is not an option for this library as it's extremely heavy and transpiles (for browser use) very poorly. node-fetch on the other hand transpiles very well (especially seeing as it mirrors the browser API where possible).

What I've got in webdav-client looks something like this:

    function getFileStream(url, filePath, options) {
        options = options || { headers: {} };
        return fetch(url + filePath, {
                method: "GET",
                headers: options.headers
            })
            .then(responseHandlers.handleResponseCode)
            .then(function(res) {
                return res.body;
            });
    }

Notice return res.body;, which returns a Readable stream.

Let me put together a WIP PR and see how that checks out..

ultimate-tester commented 7 years ago

Ah okay didn't know that (don't know node-fetch at all). I wonder though, will giving the start and end options (those are Readable options) translate into a HTTP Range request so it will actually fetch that specific range? I think that it will fully download the file, then give back the part that the user specified (meaning that when you have a 2GB file for instance this will take some time). I hope to be proven wrong on this because this ranged request is actually what I need :)

perry-mitchell commented 7 years ago

Not yet finished sorry, let me make the pr shortly 😊

piranna commented 7 years ago

I wonder though, will giving the start and end options (those are Readable options) translate into a HTTP Range request so it will actually fetch that specific range?

Probably we'll need to craft the header ourselves, but yes, it should work if the WebDAV server support ranges, if not it will return the full file with a 200 error code instead of a 204.

By the way, @ultimate-tester you should take a look at the got package... nobody is using request anymore ;-)

perry-mitchell commented 7 years ago

Ok @ultimate-tester @piranna I've had a crack at implementing the stream support in the client. Wish I could have used a more up to date structure but I'm still supporting Node v4.

https://github.com/perry-mitchell/webdav-client/pull/30

I'd be very interested to hear your thoughts before continuing. If this goes through I'll also merge this PR and update the contents to use the method in the client.

~EDIT: Haven't added the range support yet..~ Added.. It works at least with the test server.

piranna commented 7 years ago

At first I was a bit upset, until I understand that fetch is returned a promised stream, now it makes sense the usage of the PassThrought instead of directly returning the (Node.js http) stream. Yes, this is in the path what I was asking for :-) Don't know if webdav-client should have createReadStream() instead of implement it only at webdav-fs, but as a helper function is useful.

By the way, upload files with PUT method allow ranges using the Content-Range header.

perry-mitchell commented 7 years ago

At first I was a bit upset, until I understand that fetch is returned a promised stream, now it makes sense the usage of the PassThrought instead of directly returning the (Node.js http) stream

Yeah.. unfortunately it was easiest to return a Promise.<Stream> as we have to wait for the connection to be formed before we start receiving.

Don't know if webdav-client should have createReadStream() instead of implement it only at webdav-fs, but as a helper function is useful.

I thought the same, but then again webdav-client was supposed to still be full featured. Meaning that if I wanted to stream directly without createReadStream I'd have to await some promises.. which is not always a great option. Having createReadStream means that I can just pipe directly into a file for instance.

By the way, upload files with PUT method allow ranges using the Content-Range header.

Very good info, thank you! I probably won't consider merging the client part until I have writing also sorted. Just on that part, I did find the following which is somewhat concerning:

An origin server that allows PUT on a given target resource MUST send a 400 (Bad Request) response to a PUT request that contains a Content-Range header field (Section 4.2 of [RFC7233]), since the payload is likely to be partial content that has been mistakenly PUT as a full representation. Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in [RFC5789]).

from here and here

It seems like PATCH is the recommended approach, but this is a new area for me in terms of knowing server support. From that SO answer it seems that SabreDAV supports PATCH with partial content, which means that theoretically ownCloud/Nextcloud style services should support it.. But I'm not sure of others. I guess seeing as this is the first implementation we can cater for just those and see how others handle.. or I could add an option to allow for both.

perry-mitchell commented 7 years ago

@ultimate-tester Regarding your earlier concern:

I think that it will fully download the file, then give back the part that the user specified (meaning that when you have a 2GB file for instance this will take some time)

Reading more into this it definitely looks like the range header is the way to go:

The presence of a Range header in an unconditional GET modifies what is returned if the GET is otherwise successful. In other words, the response carries a status code of 206 (Partial Content) instead of 200 (OK).

If a proxy that supports ranges receives a Range request, forwards the request to an inbound server, and receives an entire entity in reply, it SHOULD only return the requested range to its client. It SHOULD store the entire received response in its cache if that is consistent with its cache allocation policies.

from this spec

So providing the server responds with 206 Partial Content, I guess most servers should only be sending part of the file (as body would be the same size as the request. From this I made the following issue to update my PR: perry-mitchell/webdav-client#31

piranna commented 7 years ago

Yeah.. unfortunately it was easiest to return a Promise. as we have to wait for the connection to be formed before we start receiving.

Readable objects has the readable event to know when there's (new) data available...

I thought the same, but then again webdav-client was supposed to still be full featured. Meaning that if I wanted to stream directly without createReadStream I'd have to await some promises.. which is not always a great option. Having createReadStream means that I can just pipe directly into a file for instance.

As I've said is a helper function, I would remove it but doesn't do any hurt, at least doesn't add new dependencies because is using build-it PassTrough :-)

It seems like PATCH is the recommended approach

There's a bit of debate about this and no actual clear conclussion, but definitely you pointed a good reference, I'll need to review it... Maybe checking with OPTIONS if PATCH method is available and use PUT with Content-Range if not?

So providing the server responds with 206 Partial Content, I guess most servers should only be sending part of the file (as body would be the same size as the request. From this I made the following issue to update my PR: perry-mitchell/webdav-client#31

When a server doesn't support partial GET, it will return 200 and the full file, so it would need to be sliced (and cached?) client side, not throw an error because it's a valid response.

piranna commented 7 years ago

Ok, I've read the links that you posted and yes, seems now it's explicitly forbidden to do a PUT with a Content-Range... that's a shame because it was the most simple alternative and also doesn't probide any other one :-/ PATCH is an option, but it's application dependent. Maybe we could implement "something" (like PUT + Content-Range) and later update or fix it when we find other alternatives or something truly standard?

ultimate-tester commented 7 years ago

After reading this one: https://blog.sphere.chronosempire.org.uk/2012/11/21/webdav-and-the-http-patch-nightmare

I think we should do a HEAD request before EVERY GET, PUT or PATCH first, check if the server sends back the Accept-Ranges header and if it accepts ranges we should do that GET, PUT or PATCh request with ranges. By spec, a server is not allowed to disgard any Content-* header and MUST return with 501 header causing our request to fail.

perry-mitchell commented 7 years ago

By spec, a server is not allowed to disgard any Content-*

@ultimate-tester This is contrary to what I've read unfortunately, and that's enough for me not to trust any given group of servers 😔

I think we should do a HEAD request before EVERY GET, PUT or PATCH first, check if the server sends back the Accept-Ranges header and if it accepts ranges

This feels like massive overkill to me, but I guess it might be the only way. My only concern is testing this in real world environments and with real servers. I think we can start with just implementing ranges and we could add the detection and handling later.

perry-mitchell commented 7 years ago

Ok, I've read the links that you posted and yes, seems now it's explicitly forbidden to do a PUT with a Content-Range...

@piranna That being said, if PUT is also accepted perhaps it should be provided as an option. The consumer can at least override the METHOD so perhaps that is enough.

All in all I think this will be incremental. I'm steering towards supporting read+write streams as a minimum (almost done), then incrementally adding sending PATCH + ranged content, sending PUT + ranged content (option) and then eventually sending OPTION requests to detect features. The last one I'm very skeptical of because many home grown servers (non-nginx/apache) probably don't support OPTION requests (at least not in my experience). WebDAV servers may be different. Again, I would want a good range of test cases for what OPTION responses I'd see, but let's see how that goes.

Regarding the read of partial content, I think I'll fail for now if the length is wrong. It's a lot of work to implement the cutting + trimming of a stream in case it's too big.

After this we'll at least have ranged read+write support for streams, which is really cool. Slowly but surely.

piranna commented 7 years ago

I think we can start with just implementing ranges and we could add the detection and handling later.

+1, GET ranges should be straightforward, it's standard and just a matter of checking for the response status 200 or 206.

The conflictive one are PUT ranges, that for some time were mostly unspecified and now they are forbidden without an easy to implement spec, so I'll go for it using Content-Range because it's the easiest one and later we'll try to fix it for other servers in a case by case basis until a standard arises (if any), because it's no clear how they should behave at all.

perry-mitchell commented 7 years ago

Agreed.. I guess the sending of ranges can be quite custom with PUT/PATCH support.. that's not too hard. I would still think failing for GET is the best way if the response is not 206 when requesting a range. This can be handled later.

I'll have an update tomorrow of the PR to include writing most likely..

perry-mitchell commented 7 years ago

Or if someone else wanted to have a go at writing the writestream code in WebDAV-client they can do so of course 😁.. I just got a bit carried away.

I may merge the read support first so we can merge this one too.. perhaps tomorrow. Write can come later.. 🙂

piranna commented 7 years ago

Regarding the read of partial content, I think I'll fail for now if the length is wrong. It's a lot of work to implement the cutting + trimming of a stream in case it's too big.

Just add a counter... I have done it in the past, I can implement it when we finish this.

After this we'll at least have ranged read+write support for streams, which is really cool. Slowly but surely.

Well, all the features we are talking about here are really cool, we can craft a nice module here! :-D

perry-mitchell commented 7 years ago

Just add a counter... I have done it in the past, I can implement it when we finish this.

What do you mean? If we request say a chunk of 256 bytes, and an entire file is returned of say 1024 bytes, how would you trim if it's in the middle of the stream? You'd have to implement reading of the read for a partial range and then write that out.

It's a bit cumbersome but I know it needs to be done at some stage. If you're happy checking that out @piranna that'd be really awesome.

perry-mitchell commented 7 years ago

Merging this so we can get readstream support out :)

piranna commented 7 years ago

What do you mean? If we request say a chunk of 256 bytes, and an entire file is returned of say 1024 bytes, how would you trim if it's in the middle of the stream? You'd have to implement reading of the read for a partial range and then write that out.

First rule when using Node.js: find a module on npm that does what you want :-D Or if you want several concatenated slices... :-P

perry-mitchell commented 7 years ago

First rule when using Node.js: find a module on npm that does what you want

For node i'd agree, but not for JS in general. With both of these projects I've tried to retain a careful balance between not reinventing the wheel and size when transpiling using webpack. I want a small footprint for these packages and libraries like range-stream (and through2 therein) have several dependencies which have to also be transpiled and bundled for web projects. If I can do it manually in a small enough amount of code I'd prefer to do it that way. But let's see :)

ultimate-tester commented 7 years ago

I don't care really, but I like to follow you (perry-mitchell) on the no-dependency train because I always hate dependencies that mess with my goal in some way.

The only real thing I care about (which is why I created the pull request in the first place) is that the stream actually is capable of doing a range request. It's because I am streaming a large file (~2GB) from the webdav and I only need roughly 4MB from this whole file (thus I want to be able to specify a range which will result in a ranged request). Therefore my pull request, I simply provided a ranged request and pushed it to a new stream before returning. If you want, you can push your stuff to make it into a stream and then I'll add the stuff required to make ranged request (as long as either the start OR end option is provided).

piranna commented 7 years ago

I want a small footprint for these packages and libraries like range-stream (and through2 therein) have several dependencies which have to also be transpiled and bundled for web projects.

Fair enought :-) On the other side, range-stream seems to don't be updated from two years ago and don't have a repository, so it seems to be a good time to "update" it ;-) I'll take a look and give you some comments.

perry-mitchell commented 7 years ago

I'll take a look and give you some comments.

I really appreciate the help and input. Let me know if you happen to find a better option there - I'm definitely not anti-packages, just a bit picky :smile:

piranna commented 7 years ago

I'll take a look and give you some comments. I really appreciate the help and input. Let me know if you happen to find a better option there - I'm definitely not anti-packages, just a bit picky

https://www.npmjs.com/package/@piranna/range-stream

Et voila! :-D No dependencies! :-P

perry-mitchell commented 7 years ago

The update is now in master, and I'll release it a bit later today.

piranna commented 7 years ago

Wiii!!! ^^