rettinghaus / MEILER

MEI Lilypond Engraving Refinement
MIT License
20 stars 7 forks source link

Change stem direction algorithm #17

Closed th-we closed 7 years ago

th-we commented 7 years ago

This fixes some wrong stem directions/invalid Lilypond code produced in the test file. It also avoids the preceding::* axis which probably performs with O(n²) complexity.

I'm not sure, however, what the test at line 2316 was supposed to do. Is it anything that this patch would break?

rettinghaus commented 7 years ago

My code seems to be a bit complicated, but it produces a much cleaner LilyPond output than yours. It basically prevents printing stem directions over and over again. Otherwise big MEI files would produce huge LilyPond files consuming much time to parse. Try this example and compare my version with yours:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://music-encoding.org/schema/3.0.0/mei-all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://music-encoding.org/schema/3.0.0/mei-all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<mei xmlns="http://www.music-encoding.org/ns/mei" xmlns:meiler="NS:MEILER_TEST" meiversion="3.0.0">
    <meiHead>
        <fileDesc>
            <titleStmt>
                <title/>
            </titleStmt>
            <pubStmt/>
        </fileDesc>
    </meiHead>
    <music>
        <body>
            <mdiv>
                <score>
                    <scoreDef meter.count="4" meter.unit="4">
                        <staffGrp>
                            <staffDef n="1" clef.line="2" clef.shape="G" lines="5"/>
                        </staffGrp>
                    </scoreDef>
                    <section>
                        <measure n="1">
                            <staff n="1">
                                <layer n="1">
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                    <note pname="b" oct="4" dur="16" />
                                </layer>
                            </staff>
                        </measure>
                        <measure n="2">
                            <staff n="1">
                                <layer n="1">
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                    <note pname="b" oct="4" dur="16" stem.dir="up" />
                                </layer>
                            </staff>
                        </measure>
                    </section>
                </score>
            </mdiv>
        </body>
    </music>
</mei>
th-we commented 7 years ago

Yes, I noticed that. My priority here was not cleanliness. Besides converting the added test cases, the most fundamental problem that I'd like to address here is the mentioned O(n²) computational complexity. I ran some tests with the master branch and as a comparison a private branch that I started to heavily modify. While modifying the code, wherever I found preceding::*, following::* and descendant::* axes, I tried to replace them if they might have to traverse large parts of the document. However, my modifications are in a messy state and I'm not sure what changes (apart from the axes replacements) are good and what should be done differently. Some of my changes aren't particularly improving performance, but all of them should scale well (i.e. linearly).

I ran a test today, logging processing time for very simple auto-generated documents growing in size. It clearly shows the quadratic behavior:

$ bash tests/test-large-file-performance.sh 999999
1 measures
3.68
2 measures
3.82
4 measures
3.64
8 measures
3.75
16 measures
3.90
32 measures
4.44
64 measures
4.90
128 measures
5.01
256 measures
5.86
512 measures
7.82
1024 measures
12.46
2048 measures
17.90
4096 measures
44.02
8192 measures
164.41
16384 measures
682.92
32768 measures
3106.42

Clearly, for larger numbers of elements, the processing time quadruples when the number of elements doubles. I did experiments with massively introducing keys. I've not replaced every O(n²) XPath expressions, so behavior still becomes quadratic at some point, but it's already much later:

$ git checkout wip/develop 
Switched to branch 'wip/develop'
tom@t420:~/devel/MEILER$ bash tests/test-large-file-performance.sh 999999
1 measures
5.64
2 measures
6.03
4 measures
6.01
8 measures
7.55
16 measures
6.85
32 measures
7.17
64 measures
9.68
128 measures
8.07
256 measures
9.96
512 measures
10.34
1024 measures
14.65
2048 measures
19.06
4096 measures
22.47
8192 measures
39.38
16384 measures
94.29
32768 measures
255.99

It's clearly slower for very small samples, but for around 12000 notes it becomes faster than the master branch. (That's less than the Hummel sample and it's precisely the point where time fully tips over to clearly quadratic behavior.) For the last measured test case, it already takes 12 times longer.

Clearly, this single pull request doesn't hugely affect the performance by itself, but I'd like to address this general problem in my contributions. It's a tricky problem to e.g. find the current time signature efficiently and with linear performance, but using keys it's possible, by traversing the document once and building an index accessible by keys instead of re-traversing the document from each place where one needs to know the time signature. I have some ideas how to do that.

Maybe we should meet and discuss this if you're interested in my contributions going in that direction.

rettinghaus commented 7 years ago

I appreciate your efforts and contributions and they are moving in right direction. E.g. using keys is definitely a better way than my step-by-step-straight-forward-programming. But the basic idea here originally was to generate a beautiful clean LilyPond code, that can be manually changed/improved later on. So generally I tried to output only necessary things. The stem direction algorithm may be improvable, but I don't want to take a step back to where I'm coming from.

For a really fast conversion we should move from XSLT to Python or C++, and I'm already thinking of adding a LilyPond exporter to Verovio.

rettinghaus commented 7 years ago

I opened a new branch develop-speed for this. Could you resolve coflicts and move your PR?