mono / taglib-sharp

Library for reading and writing metadata in media files
GNU Lesser General Public License v2.1
1.28k stars 312 forks source link

Supporting BEXT / iXML Data #74

Closed jamsoft closed 7 years ago

jamsoft commented 7 years ago

I've recently found this library after fighting with my own custom Riff parsers and would be really interested in getting stuck into helping extend the library. The major features missing for me at the moment are reading these BEXT (Broadcast Wave Data - https://en.wikipedia.org/wiki/Broadcast_Wave_Format) and the standard iXML data (http://www.gallery.co.uk/ixml/).

As I'm still feeling my way around this library and the source code where would be a good place to start to add this? I understand this is quite a big ask and not really an "issue" as such but this is the best place to request/document this. Thanks.

Starwer commented 7 years ago

Hi @jamsoft, The Riff implementation uses the CombinedTag class. It means that whatever format supported by this class can contain multiple tag-format. I think the good way to start is to add a BEXT directory which contains a File.cs and Tag.cs, inheriting from the TagLib.File and TagLib.Tag. The new tag type (BEXT) should be added to the Riff class the same way as Id3v2 or DivX. Looking at one of these format implementation should give a hint how to proceed. You will have to add a self-checking test in tests/fixtures/TagLib.Tests.TaggingFormats, that basically proves that you can handle information on this new format properly. Once this works, and if every test pass (see README) on how to run the tests),, you can issue a "Pull Request" to get this integrated. Hope this short explanation gives you a kick start !

jamsoft commented 7 years ago

Excellent chap!! Much appreciated. I'll get to look at this maybe tomorrow as I've been wanted to get this one licked for a while now. Thanks again.

jamsoft commented 7 years ago

Actually, from looking at the code a bit more I'm slightly confused why I would need to create a new directory/ File.cs combination. Broadcast Wave files (BWF) are all RIFF/WAV files. And the BEXT/iXML is a standard chunk in RIFF files. All the Tags which I want to implement belong to the Riff file type. There are quite a few bits to add to the LIST tag for instance.

I'm heavily involved in processing lots of advanced audio files from many different applications and there is a lot missing from the tag-libsharp Riff class that I'm planning on implementing.

jamsoft commented 7 years ago

I'm also a bit confused about the testing methodology, there doesn't appear to be a single .WAV file in the \tests\samples directory.

jamsoft commented 7 years ago

I was also unable to run the unit tests due to a BadImageFormatException when loading GTK# I already had GTK# 2.12 installed as I do a lot of mono/Xamarin stuff.

Switching the unit test project to compile in x86 solved the issue. Has this been reported by others? Any reason why this isn't the default currently in the repo?

Starwer commented 7 years ago

All this probably means that nobody worked on a dedicated wav-friendly implementation yet. That makes your contribution even more appreciated! Don't hesitate to create a (dummy) wav file for supporting your test (beware of size and copyright). Recoding 3 seconds of nothingness should make the trick for example. If this is really wav-specific, then you can create a FileFormats test instead of a TaggingFormats Test. About Mono, I can't tell as I am on Windows...

And I am not too much aware of the BEXT format. So if it is, indeed, a Riff extension, just extend the Riff implementation (Tag/File...). Can not help you more right now, as I am on vacation.

jamsoft commented 7 years ago

No problem chap. I've made MANY tests and I have the core of this working now, reading and writing. The BEXT chunk is historical really, BEXT data is now formalised somewhat and included in the iXML spec.

I have a large pool of wav files at my disposal, some are very narly to deal with and I still have some issues with those and their various chunks to iron out.

One test file I have been working with has a LIST chunk that taglib# just cannot read properly so I'm still looking into that.

jamsoft commented 7 years ago

@Starwer You can close this ticket now, I'm pretty clear on the requirements.

tcHylmrX commented 7 years ago

If we can agree on a ReSharper format to use, I would like to go through and do some cleanup.

From: Jammer [mailto:notifications@github.com] Sent: Sunday, September 24, 2017 2:31 AM To: mono/taglib-sharp taglib-sharp@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [mono/taglib-sharp] Supporting BEXT / iXML Data (#74)

I may just have to work in my own local Repo actually. There are a lot of problems with the source code as it is. There are many formatting issues and having the root class called File causes collisions with the .NET built in File type. I just turned ReSharper analysis on and it's freaked out. lol -1800 code issues.

There also seems to be code dating from way back that is basically re-inventing the wheel and really should be replaced with calls to framework code.

I assume a lot of this is due to it being a mono project where a lot of the .NET stuff hadn't been ported/implemented I expect.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mono/taglib-sharp/issues/74#issuecomment-331698332, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJtaB__FiWHFAYEcUonch0hgp4RZjzyqks5sliFEgaJpZM4PdCAK.

Starwer commented 7 years ago

I don't understand this. I don't have mono, and I run it successfully on Visual Studio 2015 by just following the indications in README file. It reports 0 error/warning.

jamsoft commented 6 years ago

To be honest there is a lot of formatting issues imho. The layout of the project file and namespacing doesn't follow standards. If you add a new tag class VS will add a class in a namespace like TagLib.TagLib.Riff.xxxxx. BOOM problems. This is all an overhang from early mono days and is completely understandable.

The coding style looks like Javascript and I have to fight with VS to maintain that layout (which I've given up doing actually).

But I do know that a lot of C# devs would be put off getting involved simply because of these formatting and structure issues.

I also think the lack of {} characters when scoping code is a nasty habbit. I know C# and the compiler don't mind but having explicit scopes in the code is a good idea in general.

The unit tests are laid out where they aren't really atomic, there are a lot of calls to utility methods which isn't always a problem but is a bit hard to follow at times which for a unit test is a bit of an anti-pattern in some ways.

Starwer commented 6 years ago

Ok, I didn't realize there were such a strong coding-style requirement from the C# dev community. In comparisons to the projects I've been part of in many programming and scripting languages, I found that the code was looking pretty good, and well structured, especially when we think of how many voluntary contributors are involved. That's probably because I've seen too many ugly codes from people who had no idea on what maintainability could mean, or had no clue what OOP is good at. I have no strong opinion of what the ideal structure should be.

The unit tests are laid out where they aren't really atomic, there are a lot of calls to utility methods which isn't always a problem but is a bit hard to follow at times which for a unit test is a bit of an anti-pattern in some ways.

On the other hand, I have a strong opinion on that one. Having an atomic code is nice, but I'm quite happy that we break that rule for a way more important one: keep the code factorized. If you copy/paste something three times, you're doing it wrong. Or another one: "Three times or more, do it with a for". Back in the 90's, that was indeed a problem to have the code scattered on different files and functions. But VS is so good to bring you to where the function is defined ("Go to definition") that I think we now would be fools not to leverage this. If Microsoft did one good thing, it is the C#+VS combo.

jamsoft commented 6 years ago

There are methods in the code that do exactly the same things as built in Framework code. For instance in Tag.cs there is this method:

private static bool IsNullOrLikeEmpty (string value)
{
    return value == null || value.Trim ().Length == 0;
}

Its not really a big issue but it is a bit odd as there are usages of string.IsNullOrEmpty() in the code as well. But then I think this has been left in due to most contributors just building on top of these. This is absolutely a hang over from the early mono days where not every CLI method had an implementation. But Mono is now so close to the CLI that it covers most of these things. For instance it was only in the last year or so that Mono moved to .NET HttpClient implementation.

In general the structure is ok but it does confuse Visual Studio / ReSharper. Folders == namespace in VS and since there as a vast number of File.cs types and Tags adding a new one in a folder is problematic. As a test I just opened one of the folders and added a class:

namespace TagLib.TagLib.Riff
{
    class NewTag
    {
    }
}

This simple action rendered the entire solution into a non-compiling state with 665 CRC errors. Change this to:

namespace TagLib.Riff
{
    class NewTag
    {
    }
}

And it compiles.

As for the test, I agree. But there wasn't a single use of the NUnit [SetUp] attribute which is exactly the method to use to not repeat unit test code as you say. There is no need to copy / paste. A series of unit similar tests go in a class with one method marked as [SetUp] (I usually name this method Before()) to setup the environment for the tests in the class. This [SetUp] method is run exactly once prior to nunit running any test in that class. There is also a [TearDown] that is run after every test in the class for any cleanup code.

In fact many of the tests in this project aren't even unit tests, they are integration tests since they reach out to system resources (the file system) but that's just semantics in many ways since the library is very file system centric through its nature.

Starwer commented 6 years ago

IsNullOrLikeEmpty(" ") will return true, whereas string.IsNullOrEmpty(" ") will return false. In that sense, these are not exactly equivalent.

As for the test, I agree. But there wasn't a single use of the NUnit [SetUp] attribute (...) There is also a [TearDown] that is run after every test in the class for any cleanup code.

Nice ! I didn't know about this feature. I'll try to use it now if I create a new class.

jamsoft commented 6 years ago

Sorry, I meant:

return string.IsNullOrWhiteSpace(value);

Also a note of warning. The guys that wrote NUnit went on to create xUnit as they didn't like the [SetUp] and [TearDown] constructs as it meant tests weren't atomic and this lead to bad practice.

https://xunit.github.io/docs/comparisons.html

Note 2: The xUnit.net team feels that per-test setup and teardown creates difficult-to-follow and debug testing code, often causing unnecessary code to run before every single test is run. For more information, see http://jamesnewkirk.typepad.com/posts/2007/09/why-you-should-.html.

Have a read of the link there. The test code in taglib# is too difficult to follow. Test code isn't production code and copy and pasting some code between tests as detailed in that blog post is not an affront to the DRY principal. Anything in [SetUp] or [TearDown] should be the absolute minimum of code. Anything relevant to the actual test being run should be in the body of the test method itself.

The last major commercial project I worked on had over 7,000 tests just for one of its assemblies, setup and tear down was useful in that case but needs careful management or it actually starts to get in the way.

Starwer commented 6 years ago

@jamsoft : Interesting reading. So eventually, no, I won't use SetUp and TearDown. So for factorizing, the good old function/method is the way to go. What I don't understand, is why developers seems to think that factorizing in the production code is important, and don't complain about lake of readability, whereas they seems to think that boiler plate and redundant code is preferable in tests. Do they think factorizing is only useful for code size optimization ?

Well I still think factoring is mainly a maintainability issue. To be more concrete, how would you improve this code ?tests\fixtures\taglib.tests.fileformats\standardtests.cs This class aims at testing the minimal set of features that every Video or Audio format should support. Different test fixtures call the same function with different file formats. This proves that TagLib# does its job in abstracting the different tagging formats. This was a great help for my PR #71 to add new tags properties to TagLib#. Just adding checks in this class I was actually spotting, fixing and proving the property would actually work seamlessly for 17 different formats. How did I find out the 17 tests ? Right-click on WriteStandardTags > Find all references ! That's it ! So yes, it was merely disturbing 1 min for readability, but it made me gain an hour later on #71. And in my opinion, this is the only proper proof that TagLib# can execute the same tagging scheme, regardless on the internal specificities of each format.

decriptor commented 6 years ago

@jamsoft @Starwer @tcHylmrX I'd actually be really happy to see the code base cleaned up, but very carefully. Ie, don't turn resharper on and fix everything in one PR. That just isn't consumable :)

However, having said that... If someone wants to start going through and slowly cleaning things up in small PRs I'd be more than happy (along with others I'm sure) to review and merge them. I'd suggest we start with things like replacing code with stuff provided by the framework.

We should probably start a new issue around the code structure though. I've wondered about the File.cs thing as well. I think it warranty an issue and discussion.

I wouldn't be sad either if the unit tests were cleaned up too :) As for a wav file. There might have been one. We used to have a git module with large files, but due to gitorious closing down and not pulling the repo from there in time we lost a directory of test samples :(