Open rossjones opened 11 years ago
Here's how I solved that: https://github.com/tauberer/messytables/commit/9873ee63f2a5d034a50068b36faf708edbc5902d
I think BufferedFile does now have the first parameter and implementing whence is do-able, but I think you approach (you're going to download it all anyway so put it in StringIO) might be a better approach that pretending BufferedFile works. I think I'd still prefer not to keep all that stuff in memory though.
@tauberer Could you make a pull request with your fixes?
I'd rather see implemented the suggestion in #59 to create a way to have messytables cache a file locally on disk, and then the ZIP table can just require that streams be cached locally. Buffering in memory is a kludge since it can easily lead to out of memory issues.
Totally agree with @tauberer suggestion, is this something you'd consider merging?
The way I see it is that streaming only really makes sense if the data is in a tabular text format. We could say that streaming is only supported for the CSV type because that it where it makes most sense. Having a way for each type to define whether they require the files to be stored locally would be even better. So, yes, I'd merge @tauberer suggestion.
@tauberer You should have access to this repo so you should be able to create a branch that we can all work on.
I'd be glad to create a branch but there are some architectural questions to decide about it first, which is I think more appropriate on #59's thread. Also I'm on vacation now and am not really funded for this sort of work, so I probably can't help much at the moment.
@rossjones : I was agreeing with your suggestion. If you agree with that we'll have infinite recursion and the universe may implode. :)
Loading a remote zipped file breaks messytables (primarily because it can't seek on the file-like object)
It should be possible to wrap the fobj for ZipTableSet in a seekable-stream, but the bufferedfile seek method doesn't have enough arguments (seek has two args, pos and whence=0) which means the check whether to load more data will require taking whence into account.