musescore / MuseScore

MuseScore is an open source and free music notation software. For support, contribution, bug reports, visit MuseScore.org. Fork and make pull requests!
https://musescore.org
Other
12.27k stars 2.65k forks source link

[MU4 Issue] Cannot load MuseScore3 score #8143

Closed njvdberg closed 2 years ago

njvdberg commented 3 years ago

Describe the bug When loading Test..._BeethovenSymphony_No._9Op._125.mscz I had to abort after 2 hours running with 100% CPU usage (just one single core). In there 2 hours there was only some logging for the first 2 minutes:

19:10:09.103 | INFO  | main_thread | ActionsDispatcher | dispatch: try call action: file-open
19:10:17.270 | WARN  | main_thread | Qt         | libpng warning: iCCP: CRC error
19:10:23.259 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.289 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.290 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.291 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.291 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.292 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.292 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.293 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.293 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:23.294 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.447 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.447 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.447 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.447 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.448 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.448 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.448 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.448 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.448 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.448 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.628 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.629 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.629 | WARN  | main_thread | Qt         | unmapped drum note 46
19:10:40.629 | WARN  | main_thread | Qt         | unmapped drum note 46
19:14:46.053 | INFO  | main_thread | AccessibilityController | reg: item: "NotationView0 direction Horizontal" 
19:14:46.053 | INFO  | main_thread | AccessibilityController | reg: item: "NotationTab0" 
19:14:46.054 | INFO  | main_thread | AccessibilityController | reg: item: "" 

Run is aborted at 21:14, so after 2 hours.

To Reproduce Steps to reproduce the behavior:

  1. Download the score from musescore.com.
  2. Start nightly build.
  3. Open the score

Expected behavior After a reasonable time the score should be visible.

Desktop (please complete the following information):

Additional context MuseScore3 loads the score within 1 minute on the same computer.

JRYamasaki commented 3 years ago

I'll take a look at this, from just a bit of debugging it seems the loading is actually only taking about a minute but the layout / rendering is taking a while? Currently investigating why LayoutContext::layout() is causing such a slow down.

JRYamasaki commented 3 years ago

Its possible we could've changed how headers or measure work from MU3 -> 4? I'm getting these messages when running a debug build, don't think its a layout issue

16:21:35.399 | DEBUG | main_thread | Qt | createExcerpt: original score has no header frame 16:21:35.399 | DEBUG | main_thread | Qt | insertMeasure: measure not found 16:21:35.399 | ERROR | main_thread | Qt | ASSERT: "measure->isVBox()" in file C:\Users\jy\MuseScore\src\libmscore\excerpt.cpp, line 264

njvdberg commented 3 years ago

As far I know there should be no change in how headers work, in fact the kernel (libmscore) should change at all between MU3 and MU4. However, what is new in MU4 is MU4 always generates excerpts for every part in the score. BTW, which version are you using? I didn't see these messages, for me it stopped with the messages as I listed above. Mu test above was with MuseScoreNightly-202105070440-master-45fe99a-x86_64.AppImage (although, I tried it with older versions too, with the same result).

JRYamasaki commented 3 years ago

I'm creating my own debug build on the b30a63ff3822869cf9186bd92e1b9dc6983238a9 commit. It sounds like its probably a bug in the excerpt. Its surprising that MU4 doesn't just crash though, we could be seeing two different bugs but it might also be possible MU4 is just hanging after the exception is thrown. I've ran it a couple times and hit the same Assert everytime so far.

njvdberg commented 3 years ago

So you are using a debug version? That explains why you see the assert while I didn't (I think nightly build are not debug versions). But is not just the excerpts creating the problems, in MU3 I can create excerpts without problems (and it isn't even that slow!).

JRYamasaki commented 3 years ago

Perhaps I can try to fix this issue and then diagnose where are the other places that are taking so long. From the tracing I've done so far a lot of the loading process has many large loops that take quite a while (read302.cpp has a loop with a bunch of else-if's that could probably be optimized). The loop in LayoutContext::layout() is also quite slow, but I feel like I need to spend more time looking at this all. Then maybe I can time each function in the callstack to spot the biggest offenders.

njvdberg commented 3 years ago

The loop ip LayoutContext::layout() is (at least, it should) be the same as in MU3, so, I'm not sure that's the reason for this performance issue. How long does it take for you before is runs into the assert? I'm trying to do the same now using the same version.

JRYamasaki commented 3 years ago

It takes a couple minutes until I hit the excerpt. I'm running the debug build so I imagine that slows the process down, but it still shouldn't take 2 hours. its possible there is one more expensive operations that have been added that cause this slow down but I think its after the excerpt creation.

JRYamasaki commented 3 years ago
if (!measure || !measure->isVBox()) {
        qDebug("original score has no header frame");
        oscore->insertMeasure(ElementType::VBOX, measure);
        measure = oscore->first();
    }
    VBox* titleFrameScore = toVBox(measure);

    measure = score->first();
    Q_ASSERT(measure->isVBox());

So after looking at this code I'm pretty sure the reason the assert gets triggered is because of the following sequence of events

  1. If the measure doesn't exist or is not a VBox, add a measure to the orchestral score that is a VBox and set it to measure (setting the measure to a VBox, seems correct)
  2. Afterwards, always set the measure to the first measure of the score (overriding the work from step 1 and setting the measure to a TBox)

It seems like there should be an else-block that wraps the titleFrameScore and measure = score->first();, but I'm very new to the code and this process feels quite complex. Any ideas here? Also would love to be pointed to a wiki or handbook page that explains all these terms like VBox and TBox as its hard for me to understand what they are.

njvdberg commented 3 years ago

What puzzles me, the same code in MU3. So why is this ASSERT not triggered in MU3? I'm wondering whether this issue is caused by the way how MU4 treats excerpts...

For terms like VBox and TBox, there is no wiki or handbook (as far I know). However Box is a kind of MeasureBase (see src/libmscore/box.h) and VBox and HBox are subclasses of this Box. A VBox is a Vertical Frame, HBox is a Horizontal Frame.

njvdberg commented 3 years ago

In a local version I removed the creation of excerpts (this where the code fragment above come from) but this doesn't solve the issue. So, although a debug version crashes on the ASSERT, this is not the cause of the long loading times.

JRYamasaki commented 3 years ago

I've done a bit more digging but I know for sure the offending function Excerpt::createExcerpt. There is an instance where one excerpt took about 3.5 minutes to create. Its a hefty function, but I'll start profiling the different parts as it seems to do a lot.

njvdberg commented 3 years ago

Good to see you making some progress! But I'm wondering Excerpt::createExcerpt() is really the issue here because it is supposed to same as used in 3.6.2 and I just checked, creating all parts in 3.6.2 takes roughly 40 sec (on the same laptop I did the original test). This is long but acceptable for generating 52 parts of a 2600 measure piece. So, is it the Excerpt::createExcerpt() itself causing the issue or is it what starts Excerpt::createExcerpt() causing the issue? Hopefully you can find out.

JRYamasaki commented 3 years ago

After a bit more digging, I was able to find the culprit which lives in void Score::styleChanged(). From what I can tell, styleChanged() does 4 things:

The culprit was Scan Elements (which would take a whopping 3.5 minutes for subsequent and individual calls of Score::styleChanged()). From what I can tell, scanElements() doesn't even do anything. It looks like it supposed to run the func that is passed in but it never does (at least not in this case). If I'm misunderstanding something please correct me. Here is the function itself

 void ScoreElement::scanElements(void* data, void (* func)(void*, Element*), bool all)
{
    for (ScoreElement* child : (*this)) {
        child->scanElements(data, func, all);
    }
}

This looks like it just eats memory and does nothing since its recursive and never calls the actual function.

If I comment scanElements() out, then I'm actually able to see the first page of the score in MuseScore after about 15 mins on a debug build with no debugger attached. I'll see if I can get a release or release-with-debug going to see how fast it would run, but I've never created either of those builds with Qt so it may be a while.

njvdberg commented 3 years ago

From what I can tell this is not part of MuseScore3 which can explain why MU3 doesn't have issues loading the score (at least, I can't find it).

There was a GSoC project last year which introduced this (see the blogs for more information). But I'm puzzled what is supposed to do. You might try to ask this on the MuseScore development Discord channel.

AntonioBL commented 3 years ago

Indeed, it spends a lot of time executing this line: https://github.com/musescore/MuseScore/blob/21befc4ed1afc29985e776ac608a97c9491e2641/src/engraving/libmscore/score.cpp#L1359 it recursively calls scanElements on the children elements, for example a back trace shows the cascade calling (from outer function call to inner function call):

It seems that when I randomly interrupt the execution, the execution is most of the times inside this loop: const std::set<Spanner*>& unmanagedSpanners = score()->unmanagedSpanners(); for (Spanner* s : unmanagedSpanners) { if (s->treeParent() == this) { numChildren++; } and in particular it is usually stuck at this line: if (s->treeParent() == this) { inside function s->treeParent()

JRYamasaki commented 3 years ago

Thanks for the clarity on that @AntonioBL. I think we probably have to profile / time a LOT of the stuff in how we create excerpts and how we scan Elements since it seems like we scan a TON of elements of many different types.

@njvdberg, I'll read through the blog post and see if I can find a way to make things faster

JRYamasaki commented 3 years ago

Just for the sake of information sharing, here is the number of times each function is called within Ms::Spanner::treeParent within roughly the first 20 mins (not exact, but a rough guess)

After going out for about an hour and then coming back, here are what the calls are like now

Its clear this function is getting called literally millions of times and might have something to do with the tree-like structure. I've also attached the callstack for anyone wondering, The reason why startSegment() is so slow is because it calls the function Score::tick2rightSegment, which performs at least 2 linear searches for every call. One of those linear searches (Score::tick2measureMM()) seems to be looking for a measure and always starting at the beginning. We could fix this potential if we stored measures to ticks in a data structure but I'm not sure how viable that solution is. callStack

wizofaus commented 3 years ago

scanElements definitely does something, the updateStyle function parameter that's passed in is getting called, and you can see from Element::scanElements the code for the leaf case where it's called. TextBase::styleChanged is one of the culprits, specifically for DYNAMIC elements and especially the property FONT_FACE (the score I'm testing with does have quite a few dynamics with attached text)

wizofaus commented 3 years ago

So it seems to be QFontMetricsF::tightBoundingRect that's the main choke point, we call it many 1000s of times and it can take over 50ms, or over 1ms on average. Not sure if we should add some caching, or whether the whole mechanism of creating all the excerpts on score open is necessary....

wizofaus commented 3 years ago

I experimented with a rudimentary cache for tightBoundingRect and it allows my score to open pretty quickly (<1s), though still noticeably slower than MU3, and it's not a huge score. However the attached Beethoven score is still very very slow to load, and there's a good few seconds before it even starts to create the excerpts, so it's not just that. It's still spending >13 seconds per excerpt in updateStyle for some parts.

wizofaus commented 3 years ago

BTW I was able to confirm that a lot of time is spent inside treeParent when reading the vocal part excerpts (e.g. 50 seconds for "Soprano"). But it's certainly not the only slow point, almost everything is slow reading this score in, hard to see how we'll be able to get it load quickly without quite a lot of changes. Worse, once it does finally open (after a minute or so), almost any operation on it is unusably slow.

wizofaus commented 3 years ago

Output from Visual Studio from aborting load after 2 minutes (without my cache):

Function Name Total CPU [unit, %] Self CPU [unit, %] Module | - [External Call] QFontMetricsF::tightBoundingRect 15769 (25.36%) 15769 (25.36%) Qt5Gui.dll | - [External Call] RtlpAllocateHeapInternal 2223 (3.58%) 2223 (3.58%) ntdll.dll | - [External Call] QXmlStreamReader::skipCurrentElement 1619 (2.60%) 1619 (2.60%) Qt5Core.dll | - [External Call] QXmlStreamReader::readNext 1271 (2.04%) 1271 (2.04%) Qt5Core.dll | - [External Call] QXmlStreamReader::readNextStartElement 1263 (2.03%) 1263 (2.03%) Qt5Core.dll | - [External Call] RtlFreeHeap 1101 (1.77%) 1101 (1.77%) ntdll.dll | - Ms::SpannerMap::update 1852 (2.98%) 914 (1.47%) MuseScore4.exe | - Ms::MeasureBase::tick 1595 (2.57%) 662 (1.06%) MuseScore4.exe

I re-added the cache then ran for a full 5 minutes and got:

Function Name Total CPU [unit, %] Self CPU [unit, %] Module | - Ms::MeasureBase::nextMeasure 105147 (34.40%) 73407 (24.02%) MuseScore4.exe | - Ms::MeasureBase::tick 74902 (24.50%) 41461 (13.56%) MuseScore4.exe | - Ms::Score::tick2measure 220319 (72.08%) 30171 (9.87%) MuseScore4.exe | - Ms::MeasureBase::top 33350 (10.91%) 27023 (8.84%) MuseScore4.exe

So that should give some starting points (and more or less lines up with what @AntonioBL had determined above)

wizofaus commented 3 years ago

I threw in a rudimentary (and very unsafe!) cache for tick2measure( ) too and now the score loads within a minute - still pretty slow but it's not super fast in MU3 either. But I'd say this needs a good deal more analysis.

JRYamasaki commented 3 years ago

I can attempt some more timing / profiling tonight and see what I find. It may take a while since I'm not very familiar with the excerpt creation process

wizofaus commented 3 years ago

The Visual Studio profiling tools are pretty powerful but I'm not sure if they're available in the community edition. I think the main thing to determine first is why it's so much slower than MU3 - and that does seem to be largely because of the excerpt initialisation. But MU3 is pretty slow too - the tightBoundingRect optimisation using a cache would help significantly there, but obviously doing that properly (with memory considerations etc.) is non-trivial (caching being the other of the classic "3 hard problems" of computer science, along with off-by-1 errors!).

JRYamasaki commented 3 years ago

I'm using qtCreator so I don't think I'll have access to robust tools like in VS. I'll play around with caching as well and see how far I can get

wizofaus commented 3 years ago

There's also an existing haw::profiler::Profiler class that looks like it might be useful, but generally for profiling to be useful it does require special tools to instrument the source to add in function timings etc. Unfortunately there's not one main choke point but if we could speed up tightBoundingRect and tick2measure significantly that's where most of the gains would be. It'd be good to understand why tick2measure really needed to be called so many times too.

JRYamasaki commented 3 years ago

Ok, so after going through the callstack above (the one screenshot / image) here is why we call tick2measure so many times.

ScoreElement* operator*() { return el->treeChild(i); }

ScoreElement* Measure::treeChild(int idx) const
{
    Q_ASSERT(0 <= idx && idx < treeChildCount());
const std::set<Spanner*>& unmanagedSpanners = score()->unmanagedSpanners();
    for (Spanner* s : unmanagedSpanners) {
        if (s->treeParent() == this) {
            numChildren++;
        }
    }

Thats more or less the process of why its called so much. I imagine the

(Headers + Footers) * (Number of measures) * (Number of children elements for that measure) * (Number of Unmanaged Spanner) = A big number (Roughly the number of times we call tick2measure)

We can likely fix this by either caching or determining any of the following factors in a quicker way:

Let me know your thoughts @wizofaus. The easiest route seems like getting rid of excerpts entirely as the creation of them upon loading a score is too slow, since MS3 does not create excerpts when loading a score. That way we don't have to track down all the optimizations we could make in the excerpt creation process.

wizofaus commented 3 years ago

It wouldn't really be my decision - you'd need to talk to whoever added the changes to create the excerpts at score load time first. It's definitely worth looking into what's worth caching - obviously the main challenges with caching are a) knowing when to invalidate and b) ensuring it doesn't chew up too much memory (i.e. knowing when to clear out old entries). If there was a cache that could just be used during loading that might be worthwhile. Are you sure that updating the header/footer styles is a big part of it? I was having issues even on scores with no headers/footers. BTW Q_ASSERT(0 <= idx && idx < treeChildCount()); should do nothing for non-debug builds I would think.

njvdberg commented 3 years ago

@wizofaus

It wouldn't really be my decision - you'd need to talk to whoever added the changes to create the excerpts at score load time first.

Can't agree more :smile: and that's subject to change. And as a matter of fact, that's what I'm working right now.

DmitryArefiev commented 2 years ago

Now it's better after #11298.

Let's close it (with #11298)

I created #11392 to cover remaining performance issues with that score