valtech / aem-easy-content-upgrade

AEM Easy Content Upgrade simplifies content migrations in AEM projects
Other
61 stars 24 forks source link

AECU is executed too often in AEMaaCS #228

Open royteeuwen opened 3 months ago

royteeuwen commented 3 months ago

When using AECU in AEMaaCS, there seems to be an issue that the scripts are executed too many times. See screenshot showing an example of this. Probably because there are multiple author instances executing it in the same time?

Screenshot 2024-04-05 at 15 03 22

nhirrle commented 3 months ago

The scripts need to be developed that they can run multiple times, yes. See paragraph on https://github.com/valtech/aem-easy-content-upgrade?tab=readme-ov-file#startup-hook-since-600

royteeuwen commented 3 months ago

@nhirrle hmm it seems to be very consistent in always executing it multiple times and it doesnt seem to detect correctly (anymore?) that this is happening :/. You can see it in my screenshot, almost all runs are like this. Could this maybe be improved somehow? Its really hard to write scripts that executes on multiple pages to not come into a state that its doing duplicate modifications and throwing exceptions around that. You’d have to refresh your resourceresolver constantly

nhirrle commented 3 months ago

@royteeuwen can you provide some more details? is the script with the always selector? and any idea if this is with a recent aemaacs release? would be good if it can be verified with some sample scripts on a sandbox

royteeuwen commented 3 months ago

@nhirrle no the script is not with an .always. selector. You can see that in my screenshot it starts the run of the same scripts twice (at 12:56:16 and 12:56:20) and didn't detect that it was already running. When the run of 12:56:20 started, one of the 7 scripts was already done by the run of 12:56:16, so that's why it states that there are only 6 scripts.

The result of the two runs happening at (almost) the exact same time is the following for the second run executing the same script at the exact same time:

javax.jcr.InvalidItemStateException: OakState0001: Unresolved conflicts in /content/my-site/de/demo/sprint-1/button/jcr:content/root/main-container/social-wrapper-container/content-container/button
    at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:238)
    at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:213)
    at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.newRepositoryException(SessionDelegate.java:737)
    at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:551)
    at org.apache.jackrabbit.oak.jcr.session.SessionImpl$9.performVoid(SessionImpl.java:459)
    at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:299)
    at org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:456)
    at com.adobe.granite.repository.impl.CRX3SessionImpl.save(CRX3SessionImpl.java:220)
    at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
    at Script1.run(Script1.groovy:42)
    at org.codehaus.groovy.vmplugin.v8.IndyInterface.selectMethod(IndyInterface.java:355)
    at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
    at be.orbinson.aem.groovy.console.impl.DefaultGroovyConsoleService.runScript(DefaultGroovyConsoleService.groovy:75)
    at de.valtech.aecu.core.service.AecuServiceImpl.executeScript(AecuServiceImpl.java:214)
    at de.valtech.aecu.core.service.AecuServiceImpl.execute(AecuServiceImpl.java:188)
    at de.valtech.aecu.core.service.AecuServiceImpl.executeWithInstallHookHistory(AecuServiceImpl.java:374)
    at de.valtech.aecu.core.service.AecuServiceImpl.executeWithInstallHookHistory(AecuServiceImpl.java:357)
    at de.valtech.aecu.startuphook.AecuCloudStartupService.startAecuMigration(AecuCloudStartupService.java:175)
    at de.valtech.aecu.startuphook.AecuCloudStartupService.checkAndRunMigration(AecuCloudStartupService.java:94)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakState0001: Unresolved conflicts in /content/my-site/de/demo/sprint-1/button/jcr:content/root/main-container/social-wrapper-container/content-container/button
    at org.apache.jackrabbit.oak.plugins.commit.ConflictValidator.failOnMergeConflict(ConflictValidator.java:115)
    at org.apache.jackrabbit.oak.plugins.commit.ConflictValidator.propertyAdded(ConflictValidator.java:84)
    at org.apache.jackrabbit.oak.spi.commit.CompositeEditor.propertyAdded(CompositeEditor.java:82)
    at org.apache.jackrabbit.oak.spi.commit.EditorDiff.propertyAdded(EditorDiff.java:81)

The script itself is not an .always, and the script executes the following, where it finds like at least 1000 results with the query:

xpathQuery("/jcr:root/content//*[" +
        "(@sling:resourceType='my-site/components/button/v1/button')" +
// IRL there are more resource types, removing for this ticket
        "]").each { node ->
    println "button path: " + node.getPath()

    if (node.hasProperty("linkURL")) {
        node.getProperty("linkURL").remove()
        println "Removed linkURL property from: " + node.getPath()
    }
    if (node.hasProperty("linkTarget")) {
        def linkTargetValue = node.getProperty("linkTarget").getString()
        node.setProperty("linkWindowTarget", linkTargetValue)

        node.getProperty("linkTarget").remove()
        println "Renamed linkTarget to linkWindowTarget in: " + node.getPath()
    }
}

session.save()

The thing I could do to "improve" it, is moving the session.save() inside the .each and doing a refresh every time. But this would make the script run a lot longer.

nhirrle commented 3 months ago

Thanks for the details. I will have a look beginning of next week. there is also a related ticket https://github.com/valtech/aem-easy-content-upgrade/issues/227

nhirrle commented 3 months ago

Hi @royteeuwen So issue is happening because scripts are executed on the cluster but the aecu code is not taken this into account. We will need to migrate code to sling Jobs instead and execute them on the leader only.

This requires major changes and a new release. More info - https://adapt.to/2021/presentations/adaptto2021-designing-a-cluster-aware-application-joerg-hoh.pdf

Unfortunately I can only commit on a fix within the next 4 weeks.

royteeuwen commented 3 months ago

@nhirrle OK! Just make sure to not create the job on every AEM instance, because then you would still execute the rules x times

nhirrle commented 1 month ago

one further observation: exception thrown during startup 03.06.2024 05:01:36.209 [cm-p23458-e585661-aem-author-6784cb8fb6-2hxg7] *INFO* [sling-threadpool-649d8a9a-d1be-415c-b890-89cb09b8d432-(apache-sling-job-thread-pool)-35-AECU Cloud Startup Job Queue(de/valtech/aecu/cloud/AecuStartupJobTopic)] de.valtech.aecu.startuphook.AecuStartupJobConsumer AECU migration started 03.06.2024 05:01:36.275 [cm-p23458-e585661-aem-author-6784cb8fb6-2hxg7] *ERROR* [sling-threadpool-649d8a9a-d1be-415c-b890-89cb09b8d432-(apache-sling-job-thread-pool)-35-AECU Cloud Startup Job Queue(de/valtech/aecu/cloud/AecuStartupJobTopic)] de.valtech.aecu.startuphook.AecuStartupJobConsumer Error while executing AECU migration de.valtech.aecu.api.service.AecuException: Path is invalid at de.valtech.aecu.core.service.AecuServiceImpl.findCandidates(AecuServiceImpl.java:117) [de.valtech.aecu.core:6.5.1.SNAPSHOT] at de.valtech.aecu.core.service.AecuServiceImpl.getFiles(AecuServiceImpl.java:97) [de.valtech.aecu.core:6.5.1.SNAPSHOT] at de.valtech.aecu.core.service.AecuServiceImpl.executeWithInstallHookHistory(AecuServiceImpl.java:363) [de.valtech.aecu.core:6.5.1.SNAPSHOT] at de.valtech.aecu.core.service.AecuServiceImpl.executeWithInstallHookHistory(AecuServiceImpl.java:357) [de.valtech.aecu.core:6.5.1.SNAPSHOT] at de.valtech.aecu.startuphook.AecuStartupJobConsumer.process(AecuStartupJobConsumer.java:32) [de.valtech.aecu.cloud.startup.hook:6.5.1.SNAPSHOT] at org.apache.sling.event.impl.jobs.JobConsumerManager$JobConsumerWrapper.process(JobConsumerManager.java:543) [org.apache.sling.event:4.3.14] at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJob(JobQueueImpl.java:351) [org.apache.sling.event:4.3.14] at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.access$100(JobQueueImpl.java:60) [org.apache.sling.event:4.3.14] at org.apache.sling.event.impl.jobs.queues.JobQueueImpl$1.run(JobQueueImpl.java:287) [org.apache.sling.event:4.3.14] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)