quodlibet / mutagen

Python module for handling audio metadata
https://mutagen.readthedocs.io
GNU General Public License v2.0
1.48k stars 158 forks source link

Matroska tags #3

Open lazka opened 10 years ago

lazka commented 10 years ago

Originally reported by: Christoph Reiter (Bitbucket: lazka, GitHub: lazka)


From steven.strobe.cc@gmail.com on June 16, 2009 08:28:39

This one seems... unlikely. From https://code.google.com/p/quodlibet/issues/detail?id=167 :

Ex Falso currently cannot edit mka tags. The ability to do so would be a
useful addition.

Original issue: http://code.google.com/p/mutagen/issues/detail?id=3


lazka commented 9 years ago

Original comment by Freso Fenderson (Bitbucket: Freso, GitHub: Freso):


Remember to also do docs/api/matroska.rst or something like that.

lazka commented 9 years ago

Original comment by Ben Ockmore (Bitbucket: LordSputnik, GitHub: LordSputnik):


I've created a branch called "matroska" for steps 1-4, so that code can be reviewed and shared without polluting the default branch.

lazka commented 9 years ago

Original comment by Ben Ockmore (Bitbucket: LordSputnik, GitHub: LordSputnik):


I've begun work on this.

My plan is as follows:

  1. Create a robust EBML parser, and tweak and fine tune it to perform in the optimal way.
  2. Create a separate Matroska-specific parser, able to read the tags stored within the Matroska EBML container.
  3. Create a dict-like metadata object, using native strings (utf8) as keys, and allowing bytes and unicode to be set as values. Byte data will be interpreted as the Matroska "binary" type, while unicode data will be converted to utf8 and stored.
  4. Write tests as I go along, and fill in any gaps at the end.
  5. Possibly implement support for WebM, since it is derived from Matroska.

Useful documents:

lazka commented 10 years ago

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From Micah.Wa...@gmail.com on January 18, 2013 00:31:15

I agree.  Matroska files are far more versatile and widely supported than flac or ogg in some applications (like managing both video and audio libraries with a variety of codecs, and expecting them to play on a commercial device).  Because of its potential influence on the world's use of open/free technology, I would place supporting Matroska meta-tags for mutagen and Ex-Falso above providing full mp4 container support.
lazka commented 8 years ago

Here is some code from the exaile project: https://github.com/exaile/exaile/blob/master/xl/metadata/_matroska.py

Moilleadoir commented 6 years ago

Still on the list somewhere?

lazka commented 6 years ago

yes

phw commented 5 years ago

Might be interesting: https://github.com/QBobWatson/python-ebml . It's GPLv3, though.

Freso commented 5 years ago

I'm starting to need WebM manipulation. Is there any way I'd be able to speed this along?

lud4ik commented 4 years ago

https://github.com/exaile/exaile/blob/master/xl/metadata/_matroska.py doesn't work

72057594037927935 1
Traceback (most recent call last):
  File "test.py", line 164, in parse
    key, type_ = self.tags[id]
KeyError: 524531317

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 287, in <module>
    parse('/home/lud4ik/work/chats/audio/1348379-1570779757.webm')
  File "test.py", line 282, in parse
    return Ebml(location, MatroskaTags).parse()
  File "test.py", line 186, in parse
    value = self.parse(tell, tell + size)
  File "test.py", line 166, in parse
    self.seek(size, 1)
  File "test.py", line 57, in seek
    self.file.seek(offset, mode)
OSError: [Errno 22] Invalid argument
lud4ik commented 4 years ago

https://pypi.org/project/hachoir-metadata/

lud4ik commented 4 years ago

What is the proper condition if I want parse only header with metadata, not blocks of actual data (audio)? The "Cluster" element contains data, so I must read everything before it and stop until I find it?

ffe4 commented 3 years ago

The last commit to the matroska branch was in 2014. Anyone know the state of the implementation by @LordSputnik, and whether there were major challenges, or if it is even still compatible with how mutagen works today?

In the related ticket for Picard (link) there has been some discussion about whether to tag on the container or the stream level. As I understand the docs, in the case of mp4 there can only be container level tags, and only the first track is considered. Would container level tagging also be a sufficient for Matroska?

LordSputnik commented 3 years ago

This was quite some time ago, but from what I remember the parsing was trickier than I expected. Sorry I can't be more helpful!

I don't think there would be much lost if somebody were to pick this up and start from scratch.