Open jefffohl opened 8 years ago
@jefffohl before you begin implementing this feature with proxy server from npm
, would you consider a more difficult but also more far-reaching change and improving Papa
? As the author said he won't implement it himself, but is not against the solution: https://github.com/mholt/PapaParse/issues/49#issuecomment-163286369
@breznak - I would, but I don't think it is possible within the browser. The FileReader only takes a snapshot of the file at the time of the request, and it will not allow the browser to read the file again without a user interaction - most likely for security reasons.
For some reason, Chrome's handle on the file does allow you to read the file size as the file is updated, but the data in the file cannot be retrieved (this is what confused me for a day). See the HTML5 spec: https://www.w3.org/TR/FileAPI/#file
I believe this is the reason that the PapaParse developer is not willing to work on it - it is simply not possible.
Actually, it appears that it might be possible to reload the file if we see that it has changed, but we would have to reload the entire file - which defeats our original purpose, because then we would have to reload all of the data, and on large files that would be very inefficient. What we need is the ability to read just the portion of the file that has changed.
Thank your for the Specs! I still don't see how your workaround should work (or why Papa's shouldn't):
f
is a reference to File
object:
snapshot of the file at creation of the reference
modified
and size
parameter. (are these snapshoted=frozen too, or change dynamically?)Spec
I don't see a seek
(=skip to Nth byte) or read with offset
methods; this is a problem I don't know how you plan to work-around? The workaround - of reloading the entire file whenever the file size changes - can be seen here: http://stackoverflow.com/questions/22548683/reloading-a-file-using-html-input
I'm not sure I understood the SO solutions correctly, but these ideas might work:
Papa
is that fast ( https://jsperf.com/javascript-csv-parsers/4 ), just reread the whole file. this could be implemented in the Papa's chunk
call (with "monitor=true" argument) to return only the diff (to the graph for rendering). It is suboptimal, but would still be a huge simplification for many problems. delete
the read part, wait, loop. Saving the time on not-rereading the known bytes. This could also be implemented for Papa. Baby
? https://github.com/Rich-Harris/BabyParse Does Node.js provide more "local app" privileges, allowing to access a file more directly? Node runs on the server, so it has all the permissions that you want to give it. So, yes, it can access any file on the system.
So - yes, we could use Baby Parse and parse files on the server side instead of in the browser.
4 (We should follow @rhyolight 's advice and...) delegate this to other project, with different (lower) level of integration, that can easily take care of the file updates, and provide us only with a diff file
which we would reread quickly and append
to our data.
Eg a script (some multiplatform code?) like
while(true) { cp myFile myFile.old; sleep 5; diff myFile myFile.old > update; }
delegate to what other project?
Well, we can just require to have only the diffs (not whole updated) file as input for Monitoring? Or provide a simple utility (in Java, ...) to do the diffs in intervals for us, as above. Or truncate the file (not sure a browser JS can do that?)
Can we make the (client, browser) app a "server-like app" that has REST API?
https://stackoverflow.com/questions/921942/javascript-rest-client-library
So we could have an update(data)
method callable throught REST PUT? This was the idea in #42
Sorry, this was just a brainstorm/shitload :stuck_out_tongue: of ideas, not sure which are doable or suitable for us..?
I am imagining the server will have some REST-like features, but it will probably be just GET.
Why would you need PUT, if we are just reading CSV files?
The server I am imagining will be pretty basic. It will handle the following functions:
All that said, if we can define an abstract purpose for the server outside of the needs of this particular app, we could make it a separate project/repo.
yes, I think that's a good functionality for the server. Let me doublecheck I understand the advantages: allows to stream remote files (even if the other server does not support that feature)? + streaming local files (does it solve the problem discussed here on avoiding re-reading the whole file for monitoring mode?)
My REST idea is probably a separate feature, allowing the updates be "sent" by REST calls (allows integration with many web services, which are restful, like RiverView
), in addition to updates by writing to a file.
Yes, your understanding is correct. And yes, it will solve the problem of avoiding the need to re-read the entire file each time it is updated. We will be able to have a server-side file handle that will allow us to read the file.
:+1: :cool:
One thing that this brings up is that the experience will differ depending on whether the app is hosted locally or remotely (e.g. on a public web server). If the app is hosted locally, we can access and stream local files with continuous updates. If the app is hosted remotely, the only interface we will have to uploaded files (from the users computer) will be through the FileReader interface, which, as we know, has the limitation of not allowing us to update the data continuously.
I am hoping that I can make a somewhat elegant user experience that will automatically detect if the file is available on the same file system that the server is running on, and accept continuous updating. If the file is not available, the server will assume that the file is being sent remotely, and simply read a snapshot of the file.
Something that I forgot about is that the FileReader interface won't give you any information about the file other than its size and name. It won't tell us the local path to the file, so the server won't be able to find the file.
The alternative is for the user to know the relative path to the local file, and enter that in as a string (the same way that they might enter a URL). The server could see that it is a local path, and then retrieve the file. Again, in this situation, if the app were hosted remotely, a local path would not work. And, now that I think about it, this would be a big security hole if the app were hosted on a public web server, because it would allow users to type in any local path, which would then tell the server to retrieve that file from the local (server's) file system, which is of course a very bad idea.
So, now I need to re-think all of this. Sorry, I should have gone through this logic earlier.
So, I've been thinking this over, and I don't see a solution that would involve using the "Browse..." button to allow the user to locate a local file and load it into the app for online streaming. JavaScript in the browser is, by design, sandboxed for security reasons.
We could remove the "Browse..." button and require that all users supply a path to the file they would like to stream - either a local filepath, or a URL to a file hosted on a remote web server. If anyone ever wanted to host this app on a public server, they would need to disable the ability to supply a local filepath, and make the app only accept full URLs. This could be set in the server config.
@breznak what are your thoughts?
@jefffohl ..true about the sandboxed limitation of JS, so does this mean: the publicly hosted app will not be usable? And users will have to provide a path to the file, rather than selecting with with the file browser?
And we are doing it for the "monitoring" support, right?
If so, I'd suggest: A) just require the provided file for inputs does not contain all the points (with appended updates), but rather a diff with the updates only. So we can reread the whole file (no proxy server) each time. Or B) keep the current functionality and have a config that allows the proxy server & imposes the limitations you mention.
@breznak - Yes, this is all being done for the monitoring support. What we do depends on what the typical use case is. If this app is most typically run locally on the same machine that is producing the file to read, then using a proxy server will probably be the best option, as it will allow for monitoring both remote and local files.
We can set up the server so that in order for it to monitor local files, the server needs to be started with a special flag. This way the user has to explicitly decide to allow that option, and will therefore hopefully understand that it should not be done on a public web server.
So - we can offer three ways of accessing a file:
For all of these, I still need to test these methods out to make sure I am not missing something important that would prevent our success.
How about this setting:
I am not sure what you mean - you want to make the server optional?
..I was thinking that. Would it be too much work? If we can provide basic monitoring functionality as defaults, and detect the server and if the proxy is present, use its features for file streaming.
It is more work. We have to have a server to serve the static resources anyway, so I don't see a benefit at this time. If, in the future, there appears to be a need for decoupling the app from the server then we can work on that feature at that time. As always, I would like development to be driven by real-world needs.
Esp. here I think we have clear real-world usecases: NuPIC live monitoring of a running model and RiverView...
Motivation: proxy server can be used for streaming remote files on servers that do not have that support. And to implement
seek
(skip N bytes) in files or streams from JS.This will be used in: #17
Tasks: