Closed GoogleCodeExporter closed 8 years ago
Last time I looked at the code I didn't understand what the heck they were
doing. But
perhaps times have changed. If anyone is willing to help, providing some ruby
sample
code for example, it would be greatly appreciated.
Original comment by rubyripp...@gmail.com
on 5 Apr 2009 at 2:13
Original comment by rubyripp...@gmail.com
on 5 Apr 2009 at 2:13
As I see it, freedb doesn't need to be replaced. But I can add Musicbrainz as a
choice perhaps. Please contribute to the wikipage I've set up.
Original comment by rubyripp...@gmail.com
on 15 Apr 2009 at 6:09
That seems entirely reasonable to me. People should be able use whatever they
want.
As an aside, I'm not sure how far you've dug in the musicbrainz stack but I
should
say that the mb site itself allows users to import FreeDB data for releases that
don't exist in mb.
http://musicbrainz.org/freedb/freedb.html
I'll go look at the wiki
Original comment by uptownto...@gmail.com
on 15 Apr 2009 at 6:24
I've added you as a member, so you can edit the wiki pages.
Original comment by rubyripp...@gmail.com
on 15 Apr 2009 at 6:37
One can use FreeDB gateway to Musicbrainz in Rubyripper :
see http://musicbrainz.org/doc/FreeDBGateway
The only drawback is that you don't get the genre (see "Limitations" in the
above
page). However it can be a solution while waiting for the full MB support.
Original comment by wizzim@gmail.com
on 9 May 2009 at 5:11
Hi, I work for MusicBrainz and I'd be interested in helping you guys get this
in or
submitting a patch. What type of functionality would you be looking it in
RubyRipper?
Original comment by oliver.g...@gmail.com
on 10 Dec 2009 at 5:57
Yes! Please add musicbrainz as an option!
Original comment by chrisg...@gmail.com
on 17 Feb 2011 at 11:45
Just finished a reasonably sized effort at adding MusicBrainz support to
rubyripper. In particular:
* It generally tries to mimic the existing freedb code architecture
* It uses the XML Web Service version 2 to handle queries.
* If we fail to settle on a single release (even using the heuristics described
below), the MusicBrainz metadata will fall back on Freedb instead (although the
functionality exists to support choosing a release much like it does in the
freedb code)
* It may be turned off if need be with a preference setting
* It supports the same metadata values as freedb
* There are a few extra dependencies here: rexml, openssl, and base64. These
should all be present in the standard libraries with Ruby 1.9, so I don't
expect there to be too many problems here.
It seems to work fine for me, based on my limited testing, and the rspec files
for most of the functionality check out, so feel free to try it. I've attached
a single squashed commit which contains all of my changes needed to add
MusicBrainz support. You can try using "git am" to apply it, although you
might have a little trouble forcing it to work without the 4 latest patches on
Issue 485, since the commit IDs will be off. I certainly hope it helps at any
rate.
PLEASE NOTE: This patch hasn't had gettext run on it to collect the new
translations from cliPreferences.rb, and since I'm not running this in a GUI
environment, none of the preferences currently show up in the GTK+ preferences
window (they only work from the CLI)! These should probably be fixed before
actually doing a real release!
Some notes on design, by the way:
DiscID Calculation:
It currently calculates discids manually (since there's no real standard
program other than cdrdao which supports calculating MusicBrainz discids).
Thus, due to not having direct access to the drive, I can't absolutely
guarantee that the discids are correct. That said, they should only be
incorrect in absurdly rare cases where the last session of a multi-session disc
has more than one track and either the first or last track in that session is a
data track, but not both. I've never actually come across such a disc, and
strongly doubt they exist.
Similarly, because of the special handling of data tracks, like freedb-id
calculation, if cdinfo (or, now, cdcontrol) is not present, MusicBrainz will be
unable to identify Enhanced CDs.
Identical Metadata, Different Releases:
Since the Next Generation Schema has separated the concept of a tracklist from
release and thus caused some explosion in the number of results which will be
returned for a DiscID query, I've implemented some (admittedly not the most
efficient) heuristics to try to automatically select a reasonable guess as to
the correct release and reduce the number of "false" multi-release result-sets.
In particular, the preferMusicBrainzCountries and preferMusicBrainzDate
preferences can be used to guide these heuristics.
preferMusicBrainzCountries is a comma-delimited list of country codes
(including the MusicBrainz-specific countries XW for world-wide releases and XE
for European releases) representing the (partial) order in which releases
should be preferred based on their release country. The default value
"US,UK,XW,XE,JP" will prefer US releases to British releases, British releases
to Worldwide releases, and so on. Releases which belong to a country which is
not in the list will be preferred less than any country in the list, and
releases which LACK a country will neither be more nor less preferred than
releases with a country (although there is a bias towards releases which have a
country in the preferred country list). An empty string will enforce no
country preferences; all countries will be treated equally.
preferMusicBrainzDate may take one of three values to express the order of
preference of releases by release date. 'earlier' (the default) will prefer
earlier release dates to later ones, 'later' will do the opposite, and 'no'
will treat all release dates equally. Releases without a release date will be
neither more nor less preferred than other releases, although there is a bias
towards releases which have a date (except when 'no' is specified).
There is an additional setting, useEarliestDate, which, if set to true, will
use the earliest possible release date to set the year, even if it does not
belong to a CD. This is useful for people who would prefer to have the
original release year, rather than the release year of the particular CD (e.g.
for albums originally released in the 60s and 70s on LP). This isn't a very
smart setting though; special editions of albums will ALSO be assigned the
original release date, and recent albums originally released digitally will be
given the date of their (earlier) digital release, rather than their physical
release.
Genres:
MusicBrainz does not inherently support genres like freedb. It does, however,
support folksonomy tags of its releases, release groups, and artists. I try to
do my best to map from these folksonomy tags to ID3 genres where possible (a
small number of folksonomy tags which are not letter for letter equal to an ID3
genre are mapped as well, and this can be extended). In particular, I try to
look first at the most popular tags on the release group before falling back on
the album artists (and for various artist albums, the track artists) if I can
find no matching tags. All other things considered equal, I also look at
alphabetical order and artist order, so there's no guarantee that your David
Bowie albums won't all be tagged as "Art Rock". It's not the best solution,
and no doubt many albums will be missing genres (they'll be set nil) but it
works okay given what we've got.
Various Artists:
Unlike freedb, MusicBrainz handles Various Artists gracefully. Compilations
with many artists are usually tagged with an album artist of "Various Artists",
while each track can be assigned a different artist. We don't need to actually
guess at this, and can just use this guideline to lead us.
It also supports Split Albums as well (albums where two or three artists share
time on the album), and I apply a few heuristics to separate albums where two
artists are collaborating on all tracks from true splits (where artists take
turns with different tracks). The former is treated as a single artist, while
the latter is considered a various artists release.
Also, with the support for track-level artist associations, there's a bit of
code to avoid tagging albums which have a single album artist as being a
various artists album (otherwise, things like Disc 2 of the 30th Anniversary
edition of Ziggy Stardust would be tagged as Various Artist albums)
Original comment by comradec...@gmail.com
on 31 Oct 2011 at 7:53
Attachments:
Addendum:
I suppose there's the question of character-set encoding... I'm just using the
data I get from REXML. Since the XML from MusicBrainz is in UTF-8, it seems
reasonable to assume that's what I'm getting as well, but I'm not sure, and
haven't tested that (all of my tests have been strictly ASCII characters)
Original comment by comradec...@gmail.com
on 31 Oct 2011 at 7:58
“There is an additional setting, useEarliestDate, which, if set to true, will
use the earliest possible release date to set the year, even if it does not
belong to a CD.”
I would suggest instead simply setting the ORIGINALDATE (Vorbis) or TDOR (ID3
2.4) or TORY (ID3 2.3) tags.
“Unlike freedb, MusicBrainz handles Various Artists gracefully…”
Does this patch use the MBID (89ad4ac3-39f7-470e-963a-56509c546377) of Various
Artists to recognize a various artists release? or will it get confused by the
other artists which are actually named Various Artists?
“things like Disc 2 of the 30th Anniversary edition of Ziggy Stardust would
be tagged as Various Artist albums”
Is this not correct? It is a various artists album, with the album artist set
to David Bowie. That’s what shows up in MB anyway:
http://musicbrainz.org/release/952f4c05-1941-37af-9844-b9f84faf195f
Original comment by ha...@hawkesnest.net
on 31 Oct 2011 at 4:59
Good to see somebody working on this :) Just a few random comments:
1. Regarding the disc ID calculation, have you considered using libdiscid? This
is a well tested library, and there are wrappers for Ruby, e.g. mb-discid
(http://rbrainz.rubyforge.org/).
2. The data returned by the web service is indeed in UTF-8.
Original comment by ph.wolfer
on 31 Oct 2011 at 5:43
With respect to Comment 11:
I agree that ORIGINALDATE/TDOR/TDOY would be the "correct" way of doing it, but
for now I'm assuming that there are no changes elsewhere in the code, so as to
keep the number of changes needed to add MusicBrainz support to a minimum.
Even if support for tagging original dates is added, however, people may prefer
that the year be set to the original year in any case (e.g. if their player of
choice does not correctly support ORIGINALDATE/TDOR/TDOY)
Yes, this patch explicitly makes use of the Various Artists pseudo-artist (See
analyzeResult in lib/rubyripper/musicbrainz/musicbrainzReleaseParser.rb, which
starts at line 843 in the patch above) rather than relying on artist name. The
only time it should ever assume that a disc is a Various Artists disc is if
there are at least two tracks which do not share the same artist(s), and even
then, this requires that the album artist be the Various Artists pseudo-artist
OR that the album have more than one name-credit (An example of the latter
would be http://musicbrainz.org/release/822160f4-17bb-48c5-b62c-ec7e687dc94c
with an album credit to "Bright Eyes" and to "Son, Ambulance", but tracks
credited individually to one or the other, but not both. Incidentally, this
particular album, which serves as one of the rspec test cases, also
demonstrates that we use the name-credit ("Son, Ambulance") rather than the
artist's name ("Son Ambulance") if given)
For the purposes of Rubyripper, I believe that this would result in undesirable
behavior; Disc 1 would be filed according to the normal naming scheme, while
Disc 2 would be filed according to the Various Artists scheme, resulting in
files from different discs of the same album being filed in completely
different places. Of course, the way I've implemented it DOES mean that a
"Various Artists" release with multiple single-artist discs would end up having
files named according to the normal naming scheme, but this can be fixed by
treating ALL various artist albums as various artist discs.
With respect to Comment 12:
I did consider using libdiscid, but wanted to keep the implementation simple
for the time being (and also because I have recently been working on a
different library which would not only accomplish libdiscid's goals, but also
address Issue 316 and enable ISRC/MCN and track-index detection without the use
of any additional software packages like cdrdao or cdrtools:
https://github.com/pipian/libcueify Please note that the code there is still
under heavy development, and doesn't work on anything but FreeBSD at the
moment.)
Also, when I was talking about UTF-8, I meant that I suspect that REXML is
giving me Ruby String objects which are 8-bit byte strings encoded as UTF-8,
rather than (alternatively) wide character strings.
By the by (and this is for Bouke's sake), I do have a batch of single-commit
changes which, collectively, merge in MusicBrainz support file by file, but
there are quite a few (read 11 or 12) of these and didn't want to overwhelm him
with them due to the size of the change (obviously). I'd be happy to introduce
them one at a time to get merged in piecemeal, although they (effectively) use
git revision e7b24547885e as a fork point (although I am trying to keep up with
the master branch)
Original comment by comradec...@gmail.com
on 1 Nov 2011 at 1:52
I've merged this HUGE thing with the master branch. This really is a lot of new
code and is easily the biggest patch I've ever received :)
I will clean it up a bit, since there are some things I don't particularly like:
* I don't want a freedb and musicbrainz category. This should be one metadata
category.
* I don't want to force the disc class to have logic in it which metadata
subclass it should use. It should only trigger the metadata to be fetched.
* The cucumber tests are broken currently.
* Some of the files should be reorganized for clarity.
Original comment by boukewou...@gmail.com
on 7 Nov 2011 at 6:51
Ok. So I've got to the point that:
* the files are structured much better :)
* the cucumber feature tests are working once again :)
* the disc class still needs some more attention :(
* the unit tests are Sloooow :(. You added six seconds to them (they were 0.8
seconds).
But only one thing at a time. I'd say this was already quite some work.
Original comment by boukewou...@gmail.com
on 7 Nov 2011 at 10:14
Yeah, the unit tests being slow are largely a result of the release-picking
heuristic, which uses a rather complicated scoring algorithm in
GetMusicBrainzRelease.multipleReleasesFound(releases) that's (at least)
quadratic in the number of releases. I've tried a couple of low-lying fruit to
try to push it down (e.g. I use counters where the original code used arrays),
but thus far it hasn't helped much.
It's also possible that the date-sorting code is slow (most cases will run
through all four cases and only return based on the x <=> y operator).
In short, the GetMusicBrainzRelease tests could probably afford to be profiled
to figure out where the slowest parts are causing a hangup. I'll try to run it
myself and see what I can get.
Original comment by comradec...@gmail.com
on 8 Nov 2011 at 5:31
After running some profiling, it turns out that, based on 10 trials, fully half
the time (13 of 26 seconds) was being spent simply on reading the XML files
with REXML, while a good chunk of the remainder (8 seconds) was in doing the
XPath queries. I'll see what I can do to speed it up (e.g. switching strictly
to a stream-based SAX parser, skipping the complex document-building process.)
Original comment by comradec...@gmail.com
on 8 Nov 2011 at 6:02
I don't know if it's possible, but instead of parsing a complete XML, it might
be possible to just parse a small part of it and include this as text in the
spec file, just enough to test the specific case.
Are you sure there isn't a connection to the internet being setup somewhere?
This is a known slowdown and should be prevented in unit tests.
Original comment by boukewou...@gmail.com
on 8 Nov 2011 at 7:02
No, I'm fairly sure, based on the Ruby-Prof results, that the internet is not
being used. REXML has a bit of a history with taking time to do parsing as
well. See:
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/6b6848e0be089
a3e and
http://townx.org/blog/elliot/ruby_tuesday_parsing_xml_with_rexml_document_and_re
xml_streamlistener
I could trim the sample XML down a bit, but on the other hand, I would like to
make sure that the actual output from the server is being tested so that we
won't be "surprised" by complex XML data returned by the server. Like I said,
I'll give a shot at using the SAX parser and skipping the entire
document-building process, which seems to make up about a quarter of the time
at the very least. I suspect that by going with SAX and simply populating a
custom data structure (since we don't care about most of the XML structure) we
could cut load-times.
Alternately, rewriting the XML-handling code to use nokogiri
(http://nokogiri.org/) or libxml-ruby (http://libxml.rubyforge.org/) would not
only be easier, but could also gain us some performance by pushing the XML
parsing into C code rather than entirely doing such in ruby. Either would add
extra dependencies on libxml2 and the appropriate XML library.
Original comment by comradec...@gmail.com
on 8 Nov 2011 at 7:33
You're right about the internet, the specs still work when the network is
stopped. I'll have a look if I can improve it some. It would be wrong to add
dependencies only to improve the unit test performance.
Original comment by boukewou...@gmail.com
on 8 Nov 2011 at 5:51
I think the spec tests should be rewritten with just the needed XML stuff in
it. This could be achieved based on the information I've seen in:
http://svn.musicbrainz.org/mmd-schema/trunk/schema/musicbrainz_mmd-2.0.rng
Only include the XML that is neccesary to let the test succeed. The rest is
noise and slows the test down a lot.
I do understand your concerns for the complete scenarios. Well, you might want
to turn one a few of the more complex XML files into a cucumber feature test to
be sure. These are ment to be more end-to-end.
Original comment by boukewou...@gmail.com
on 8 Nov 2011 at 6:43
Tried pruning some files, but it didn't really seem to have an impact on the
rspec time (3.22s to 2.91s for me). I've also switched the rspec tests which
read the XML files to cache the data rather than re-read it for each test, but
since most of the parsing is actually happening when trying to parse the artist
tag XML (which happens with each query and can't be cached effectively without
actually stubbing out part of MusicBrainzReleaseParser's implementation which
is being tested), it hasn't helped much more (2.91s to 2.8s at best).
Still, it should be a little bit of an improvement.
Original comment by comradec...@gmail.com
on 15 Nov 2011 at 12:51
Thanks for your efforts here :) The specs take 4.5 seconds now on my machine,
which is more than 25% overall improvement (it was 6.2). This is sufficiently
fast in my opinion.
Original comment by boukewou...@gmail.com
on 15 Nov 2011 at 6:43
Original issue reported on code.google.com by
uptownto...@gmail.com
on 9 Mar 2009 at 11:24