Sensible defaults for data retention and cleanup for SAP Commerce, based on my CX Works article Data Maintenance and Cleanup
Download the latest release
Unpack to hybris/bin/custom
If possible, disable saved values / change history (ref. help.sap.com, further recommendations in my article)
Add extension to your localextensions.xml
<extension name="sanecleanup" />
:red_circle: Adapt the retention rules to your project requirements :red_circle: \
(check the available properties in project.properties
)
Build and deploy.\ (The rules will be automatically imported during system update)
If you get a build error regarding missing types like the example below:
- Open
sanecleanup-items.xml
- Search for the type
- Comment-out the whole
<itemtype>
taginvalid index sanecleanup for type _TYPE_ on [...] declared at ((sanecleanup))::YIndex[sanecleanup-items.xml:...] due to missing enclosing type '_TYPE_'
WARNING\ The very first execution of the retention cron jobs will take a while, depending on how long your poject is already live and if you have cleaned up anything in the past.
Consider performing a one-time cleanup before adding the extension / enabling the retention rules.
Especially the first run of cronJobLogCleanupCronJob
will take a very long time, if you have never removed any cronjob log files (type LogFile
).\
Please consider importing and executing the script job defined in bulkdelete-cronjoblogs.impex before you set up the automated cleanup!\
The job will remove all log files except the five most recent logs per CronJob.
(Disclaimer: the script was tested on MS SQL / Azure SQL and SAP HANA. It is not guaranteed to work for other databases)
If have never even thought about that topic - yes!
You can run the following scripts in the administration console to get a quick overview:
excessive-platform-types.groovy
- Generates a report about "known troublemakers"check-audit.groovy
- Check if you have too many audit logsHere are some additional queries and "rules of thumb" that help you investigate further:
Type(s) | Query | Notes |
---|---|---|
AbstractRule | ```sql SELECT COUNT({ar:pk}), MIN({ar:modifiedtime}) AS "oldest", MAX({ar:modifiedtime}) AS "newest" FROM {AbstractRule AS ar}, {RuleStatus AS rs} WHERE {ar:status} = {rs:pk} AND {rs:code} = 'PUBLISHED' AND {ar:enddate} IS NOT NULL AND {ar:enddate} < getutcdate() ``` | Are there any outdated rules? i.e rules that aren't valid anymore because their enddate is in the past. Warning: change `getutcdate()` to your DBMS (for HANA/MySQL: `now()` ) |
BusinessProcess | ```sql SELECT {p:processDefinitionName}, {s:code} AS "status", COUNT({p:pk}) AS "total", MIN({p:modifiedTime}) AS "oldest", MAX({p:modifiedTime}) AS "newest" FROM {BusinessProcess AS p LEFT JOIN ProcessState AS s ON {p:state} = {s:pk} } GROUP BY {p:processDefinitionName}, {s:code} ORDER BY "total" DESC ``` | Are there too many (let's say > 1000) or very old BusinessProcess in your system? Also, if a lot of processes are stuck in "RUNNING" / "WAITING", you have to investigate what's wrong. (What is causing your processes to be stuck?) |
Cart | ```sql SELECT {b:uid} AS "BaseSite", {u:uid} AS "USER", CASE WHEN {c:saveTime} IS NULL THEN 'regular' ELSE 'saved' END AS "cart type", COUNT({c:pk}) AS "total", MIN({c:modifiedtime}) AS "oldest", MAX({c:modifiedtime}) AS "newest" FROM { Cart AS c LEFT JOIN USER AS u ON {c:user} = {u:pk} LEFT JOIN BaseSite AS b ON {c:site} = {b:pk} } GROUP BY {b:uid}, {u:uid}, CASE WHEN {c:saveTime} IS NULL THEN 'regular' ELSE 'saved' END ORDER BY "total" DESC ``` | - Are there excessive amount of carts per site or per user? - Too many saved carts? - Stale (= old) carts? |
CronJob (auto-generated) | ```sql SELECT {t:code} AS "CronJob Type", COUNT({c:pk}) AS "total", MIN({c:modifiedtime}) AS "oldest", MAX({c:modifiedtime}) AS "newest" FROM {CronJob AS c JOIN ComposedType AS t ON {c:itemtype} = {t:pk} LEFT JOIN TRIGGER AS trg ON {trg:cronjob} = {c:pk} } WHERE {trg:pk} IS NULL AND {c:code} LIKE '00%' AND {t:code} IN ( 'ImpExImportCronJob', 'CatalogVersionSyncCronJob', 'SolrIndexerCronJob' ) GROUP BY {t:code} ORDER BY "total" DESC ``` | Are there too many (>10) outdated, auto-geneated jobs in your system? |
CronJobHistory | ```sql SELECT {cj:code}, COUNT({h:pk}) AS "total", MIN({h:modifiedtime}) AS "oldest", MAX({h:modifiedtime}) AS "newest" FROM {cronjobhistory AS h JOIN cronjob AS cj ON {h:cronjob} = {cj:pk} } GROUP BY {cj:code} ORDER BY "total" DESC ``` | Is there any job with > 50 histories and/or histories older than an hour? This cleanup is enabled by default in recent SAP Commerce patch releases, so this query shouldn't find anything. |
EmailMessage | ```sql SELECT {bp:processDefinitionName} AS "source", {m:sent}, COUNT({m:pk}) AS "total", MIN({m:modifiedtime}) AS "oldest", MAX({m:modifiedtime}) AS "newest" FROM {EmailMessage AS m LEFT JOIN BusinessProcess AS bp ON {m:process} = {bp:pk} } GROUP BY {bp:processDefinitionName}, {m:sent} ORDER BY "total" DESC ``` | - Are there more than a handful sent/unsent messages? - Are there messages that do not belong to any process? |
ImpExImportCronJob (distributed impex) | ```sql SELECT {s:code} AS "status", COUNT({i:pk}) AS "total", MIN({i:modifiedtime}) AS "oldest", MAX({i:modifiedtime}) AS "newest" FROM {ImpExImportCronJob AS i LEFT JOIN CronJobStatus AS s ON {i:status} = {s:pk} } WHERE {i:code} LIKE 'distributed-impex-%' GROUP BY {s:code} ``` | - More than ~10 `FINISHED` distributed impex jobs? - More than a few `PAUSED` jobs? You may have a faulty distributed impex script. |
ImpexMedia | ```sql SELECT COUNT(*) FROM {ImpexMedia AS i} WHERE ( {i:code} LIKE '0_______' OR {i:code} LIKE 'generated impex media - %' ) ``` | Are there more than a handful (>100) of generated impex medias? |
ImportBatchContent | ```sql SELECT COUNT({c:pk}) AS "total", MIN({c:modifiedTime}) AS "oldest", MAX({c:modifiedTime}) AS "newest" FROM {ImportBatchContent AS c LEFT JOIN ImportBatch AS b ON {b:importContentCode} = {c:code} } WHERE {b:pk} IS NULL ``` | Are there any left-over distributed import batches? |
LogFile |
```sql
SELECT
COALESCE({cj:code}, ' | Are there are cronjob with more than ~10 logs and/or logs older than 14 days? (those are default values for log file retention) |
ProcessTaskLog | ```sql -- Query tested with MS SQL -- Adjust the date calculation for -- other databases SELECT COUNT({l:pk}) AS "total", MIN({l:modifiedtime}) AS "oldest", MAX({l:modifiedtime}) AS "newest" FROM {ProcessTaskLog AS l} WHERE {l:creationTime} < DATEADD( MONTH, -2, GETUTCDATE() ) ``` | We recommend customer to BusinessProcess cleanup, which will eventually take care of TaskLogs cleanup. There might be the few scenarios for ProcessTaskLog cleanup: 1. The customer wants to keep the BusinessProcess for reporting, although we don't recommend it. 1. The customer might be using the custom task without any business process. |
SavedValues,SavedValueEntry | ```sql -- total SavedValue / SavedValueEntry SELECT * FROM ( {{ SELECT 'SavedValues' AS "type", COUNT({s:pk}) AS "total" FROM {savedvalues AS s} }} UNION ALL {{ SELECT 'SavedValueEntry' AS "type", COUNT({e:pk}) AS "total" FROM {savedvalueentry AS e} }} ) summary -- SavedValues per item SELECT {s:modifiedItem} AS "item", COUNT({s:pk}) AS "total", MIN({s:modifiedtime}) AS "oldest", MAX({s:modifiedtime}) AS "newest" FROM {SavedValues AS s } GROUP BY {s:modifiedItem} ORDER BY "total" DESC -- orphaned SavedValueEntry -- (there shouldn't be any) SELECT COUNT({e:pk}) AS "total", MIN({e:modifiedtime}) AS "oldest", MAX({e:modifiedtime}) AS "newest" FROM {SavedValueEntry AS e LEFT JOIN SavedValues AS s ON {e:parent} = {s:pk} } WHERE {s:pk} IS NULL ``` | A lot of those items accumulated over the project lifetime. If possible, disable storing saved values. (`hmc.storing.modifiedvalues.size=0`) |
SolrIndexOperation | ```sql SELECT {i:qualifier}, COUNT({o:pk}) AS "total", MIN({o:modifiedTime}) AS "oldest", MAX({o:modifiedTime}) AS "newest" FROM {SolrIndexOperation AS o LEFT JOIN SolrIndex AS i ON {o:index} = {i:pk} } GROUP BY {i:qualifier} ORDER BY "total" DESC ``` | Too many solr operations (more than ~100 per index)? |
StoredHttpSession | ```sql SELECT COUNT({s:pk}) AS "total", MIN({s:modifiedtime}) AS "oldest", MAX({s:modifiedtime}) AS "newest" FROM {StoredHttpSession AS s} ``` | Excessive amount of session? This is hard to generalize as it highly depends on your site's traffic, but if you are near or over 5 digits, it's probably too much. Simarly, stale sessions (e.g older than a day) don't need to be retained. |
TaskCondition | ```sql SELECT COUNT({tc:pk}), MIN({tc:modifiedtime}) AS "oldest", MAX({tc:modifiedtime}) AS "newest" FROM {TaskCondition AS tc } WHERE {tc:task} IS NULL ``` | Is there an excessive amount of ["premature events"](https://help.sap.com/docs/SAP_COMMERCE_CLOUD_PUBLIC_CLOUD/aa417173fe4a4ba5a473c93eb730a417/7e8ff9d7653f43e8890bc8eb395d52a7.html?locale=en-US#premature-events)? Or very old (older than a a few weeks) events? |
Please open an issue describing your problem or your feature request.
Any and all pull requests are welcome.\ Please describe your change and the motiviation behind it.