leosongwei / mutagen

Automatically exported from code.google.com/p/mutagen
GNU General Public License v2.0
0 stars 0 forks source link

FileType, Metadata: File-like object support #1

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
FileType and Metadata subclasses should support loading from file-like
objects as well as filenames. We can probably restrict this to FLOs that
support random access.

Original issue reported on code.google.com by joe.wreschnig@gmail.com on 15 Jun 2009 at 5:49

GoogleCodeExporter commented 9 years ago

Original comment by joe.wreschnig@gmail.com on 16 Jun 2009 at 4:40

GoogleCodeExporter commented 9 years ago

Original comment by joe.wreschnig@gmail.com on 16 Jun 2009 at 6:14

GoogleCodeExporter commented 9 years ago

Original comment by joe.wreschnig@gmail.com on 16 Jun 2009 at 6:14

GoogleCodeExporter commented 9 years ago

Original comment by joe.wreschnig@gmail.com on 20 Jul 2009 at 2:48

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I am working on this as part of adding a audio field type to django.  The two 
options I see is to overload the filename variable and allow the passing of a 
file like object, maintaining api compatibility in the process, OR  adding a 
fileobj argument to a slew of functions. 

Both require factoring out some of the file handling functions. Do you have a 
prefrence? 

Original comment by joshua.b...@gmail.com on 17 May 2011 at 9:58

GoogleCodeExporter commented 9 years ago
we could also rename the first argument to be something more generic, like 
filething, add a filename kwarg in case someone was being extra explicit, and 
simply copy filename onto filething if filething was None. This requires a few 
more code changes and is generally ugly, but maintains api compatibility while 
adding file like object support. 

also if passed a file object I don't think mutagen should take it upon it self 
to close said object, but I could be wrong, thoughts? 

Original comment by joshua.b...@gmail.com on 17 May 2011 at 11:52

GoogleCodeExporter commented 9 years ago
I think the approach of using kwargs with the first one being some autodetect 
magic is probably best.

def __init__(self, filething=None, filename=None, fileobj=None): ...

Where exactly one must be set; filename is always assumed to be a string, 
fileobj is always assumed to be an FLO, and if neither of those is set 
filething is type-detected by looking for a seek method. With this signature we 
can also tell people to start calling constructors with the form 
OggVorbis(filename=...) to future-proof their code.

Mutagen probably shouldn't close the object, but I think it should at least 
flush it (if it supports flushing).

Original comment by joe.wreschnig@gmail.com on 18 May 2011 at 6:48

GoogleCodeExporter commented 9 years ago
Sounds good. I already have some code, I will finish putting together the id3 
and easy id3 file along the lines of what we discussed and email you the patch, 
that way we can hash anything unexpected out before we bother with the rest. 

What are your thoughts on subclassing the test case objects and overriding 
setUp to run each the tests against each method of file handling?

Original comment by joshua.b...@gmail.com on 18 May 2011 at 6:55

GoogleCodeExporter commented 9 years ago
Also It seems like we are going to have the same code in quite a few functions, 
I am thinking of breaking filething/filename/fileobj disambiguation out to a 
decorator. The downside I see to that is that is makes the api less visible 
when looking at a function, but i rather have comments repeated than code. At 
the worst folks use filething = something and it works for them. 

Original comment by joshua.b...@gmail.com on 18 May 2011 at 7:07

GoogleCodeExporter commented 9 years ago
I'm not sure that's necessary or sensible. There's only going to be a handful 
of places where we do a filename/FLO conversion. Simple coverage checks can 
make sure we handle this right.

The hard part is making sure we're prepared for non-disk FLOs once they're in 
the pipeline. If we're going to support this it should be supported as much as 
possible within reason, otherwise there's no point.

So my focus would be on what we need to accurately mock strange FLOs like FIFOs 
and sockets. We probably want:
 * Fully seekable/readable/writable within POSIX restrictions. This is basically disk files, so they should be supported now.
 * Linear read-only, like sockets. We should definitely be able to read tags from these. This might require some changes to the parsers for some of the more structured formats. I know we seek like crazy around MP3s, but it's all for data I'm comfortable not filling in for such streams anyway.
 * For the underlying tag formats, it's probably worthwhile to support linear write-only FLOs. So you could send an APEv2 or ID3v2 tag alone over a stream.

Please attach patches here rather than emailing them to me.

Original comment by joe.wreschnig@gmail.com on 18 May 2011 at 7:18

GoogleCodeExporter commented 9 years ago
I don't think we want a decorator.

def getfileobj(filething, filename, fileobj, mode="rb"):
    if != one set: raise ArgumentError
    elif fileobj: return fileobj # could even early-fail on a bad mode
    elif filename: return open(filename, mode)
    else: ... # filething type detection

def load(self, filething=None, filename=None, fileobj=None):
    fileobj = _util.getfileobj(filething, filename, fileobj, "rb")

It's still a single line of code and it's more explicit.

Original comment by joe.wreschnig@gmail.com on 18 May 2011 at 7:25

GoogleCodeExporter commented 9 years ago

Clearly if we intend on supporting flo's without seek then then a lot more 
would need to be done. Perhaps a file object abstraction class that slurps the 
whole file into memory if its doesnt support seek? 

here is basically what I have now. 

The idea being to wrap the file.open file.close etc functions to make it do the 
right thing under the circumstances. 

Original comment by joshua.b...@gmail.com on 18 May 2011 at 8:45

Attachments:

GoogleCodeExporter commented 9 years ago
Well, with all that in mind, one thought that comes to mind is to implement a 
file object proxy class that takes the filething, filename and fileobj as 
arguments to its constructor. If the plan is to support flo's that don't 
support seek natively, I rather deal with the file code as a unit rather than 
in all the places it is touched in each format.  

Original comment by joshua.b...@gmail.com on 18 May 2011 at 8:51

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
What's the advantage of supporting FLOs if we only support seekable ones? How 
many FLOs come from seekable sources that aren't files we could request by 
filename?

What's the advantage of making a new proxy type? What would it do, throw an 
(AttributeError, IOError) when trying to seek? But that's exactly what would 
happen now, so making a new type doesn't gain anything.

Original comment by joe.wreschnig@gmail.com on 18 May 2011 at 8:54

GoogleCodeExporter commented 9 years ago
I think your patch doesn't understand the point. We shouldn't be passing around 
a filething-or-filename-or-fileobj to anything but __init__, load, and save in 
public classes. And the first thing those functions should do is turn it into a 
fileobj.

Also, I don't know if that patch is indicative of the rest of what you worked 
on stylistically, but there's no way I'd consider merging a patch for something 
like this unless it follows the majority of PEP-8.

Original comment by joe.wreschnig@gmail.com on 18 May 2011 at 8:58

GoogleCodeExporter commented 9 years ago
its thrown together and would get pep 8 ified later. 

Original comment by joshua.b...@gmail.com on 18 May 2011 at 9:01

GoogleCodeExporter commented 9 years ago
a example of a flo that supports seek is anything based on stringio, like the 
file objects returned by the django storage module for amazon s3.

A reason for a file like object proxy class would be to emulate seek on flo's 
that don't support it, so we wouldn't have to rewrite large chunks of the 
format specific code just to be able to read tags from a flo that doesn't 
support seek. 

Original comment by joshua.b...@gmail.com on 19 May 2011 at 1:03

GoogleCodeExporter commented 9 years ago
Storing real audio files in StringIOs is a horrible idea. It'll eat memory very 
quickly - MP3 and Vorbis files could easily be tens to hundreds of megabytes, 
and FLACs over a gigabyte. Emulating seek faces the same problem. We shouldn't 
create situations where we end up allocating hundreds of megabytes of memory 
just to parse out some tags.

Original comment by joe.wreschnig@gmail.com on 19 May 2011 at 12:10

GoogleCodeExporter commented 9 years ago
Another example of a seekable FLO that isn't local are files stored remotely 
and accessed over SMB/SFTP via GVFS/GIO.  While GIO doesn't provide a native 
FLO, creating a wrapper to make it act like one should be trivial.  This in 
particular is a case we want to be able to support in Exaile, and is in fact 
why I opened this request way back on a prior bugtracker.

Original comment by reacocard on 19 May 2011 at 10:33

GoogleCodeExporter commented 9 years ago
I know that such FLOs used to common, but I was under the impression GVFS/GIO 
used FUSE almost exclusively these days, and the application-level support was 
a legacy of GNOME-VFS, which is dead/dying (thankfully).

If that's _not_ the case, that's a considerably more useful thing to support 
than plans to slurp hundreds of megabytes into StringIOs.

Original comment by joe.wreschnig@gmail.com on 22 May 2011 at 2:39

GoogleCodeExporter commented 9 years ago
FUSE is just a compat layer for applications that are not GIO-aware. The 
primary method of access is via the native gio.File and associated APIs. 
http://developer.gnome.org/gio/stable/ch01.html

Original comment by reacocard on 22 May 2011 at 5:40

GoogleCodeExporter commented 9 years ago
But it's available, right? Is there any actual downside to using it or is it 
just GNOME developers being stupidly parochial again?

Original comment by joe.wreschnig@gmail.com on 22 May 2011 at 6:57

GoogleCodeExporter commented 9 years ago
It adds an extra level of indirection (read: it's slower) and if FUSE isn't 
available for whatever reason, then the FUSE compat layer obviously doesn't 
work at all.

Original comment by reacocard on 23 May 2011 at 3:16

GoogleCodeExporter commented 9 years ago
Hi 
I am also interested on using a filelike object. Because for moin-2.0 I have 
that already while uploading an item. 

I don't nderstand currently why there are 3 params in that comment 12 defined
def getfileobj(filething, filename, fileobj, mode="rb"):

Why isn't isinstance used and only one of them defined?

Original comment by rb.p...@gmail.com on 2 Aug 2011 at 7:07

GoogleCodeExporter commented 9 years ago
I gave up on trying to make this happen in the main repo, as it sounds like the 
maintainer doesn't actually want to support file like objects. I wrote support 
for a few filetypes and tossed it up in github. If you have a better plan for 
how to do it, I gladly accept patches. If the maintainer wants to merge it I am 
happy to work with him, but I have better things to do than argue if anyone 
needs the feature. 

https://github.com/Jbonnett/Mutagen-flo

Original comment by joshua.b...@gmail.com on 2 Aug 2011 at 7:13

GoogleCodeExporter commented 9 years ago
thanks for telling, I'll look at it

Original comment by rb.p...@gmail.com on 2 Aug 2011 at 7:16

GoogleCodeExporter commented 9 years ago
@Joshua: I'm not against it, I just don't want to do a halfassed job of it. I 
never got anything resembling an acceptable patch back from you.

Original comment by joe.wreschnig@gmail.com on 11 Aug 2011 at 8:58

GoogleCodeExporter commented 9 years ago
Has there been any forward momentum on this in the past 10 or so months?  I'm 
finally biting the bullet and tacking on some code to hopefully read/write 
AIFF-embedded ID3 tags, and I'm kind of surprised to see that mutagen doesn't 
handle file-like objects.

Original comment by jay...@interlaced.org on 22 Jun 2012 at 12:48

GoogleCodeExporter commented 9 years ago
I'm also in favor of filelike object support. In my case I have StringIOs of 
files (they are all <20MB) and I am trying to avoid disk IO if possible. It 
would be a shame if I had to dump the files to disk only to obtain their 
artwork. I have the filename and a StringIO of the file content, so support for 
this would be awesome.

Original comment by andrefcruz on 24 Sep 2012 at 11:33

GoogleCodeExporter commented 9 years ago
For a web project I'd like to extract metadata from audio files stored in 
MongoDB (or gridfs to be more precise). And while they do have seek(), the 
objects operate on certainly aren't files and the open semantics work quite 
different.

So, is there any progress on this? Because writing the files to a temporary 
named file and extracting metadata and then copying the data to MongoDB is 
quite a waste.

Original comment by jschr...@gmail.com on 4 Sep 2013 at 3:52

GoogleCodeExporter commented 9 years ago
mutagen has moved to Bitbucket: https://bitbucket.org/lazka/mutagen/issue/1

Original comment by reiter.christoph@gmail.com on 4 Jul 2014 at 3:42