Closed marekhorst closed 1 year ago
After hotpatching direct citation matching one of the subsequent fuzzy citation matching steps (transformation) has failed with similar error:
Container killed by YARN for exceeding memory limits. 11.3 GB of 11 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
which means more than one module needs to have memory configuration adjusted.
The simplest way is to reuse already defined sparkExecutorOverhead
parameter (declared for fuzzy citation matching phase) and define this value for input transfmer phase by setting:
--conf spark.yarn.executor.memoryOverhead=${sparkExecutorOverhead}
among citation-matching-input-transformer
job spark-opts
.
Originally requested in: https://support.openaire.eu/issues/8966
Direct citation matching phase failed on BETA with:
Already deployed IIS workflow was hotpatched but we should provide a long term solution by preparing direct_citationmatching and primary_processing workflows to allow executor memory overhead to be adjusted at runtime.
After we find out the new memory related config it should be committed to the gitlab repo at ICM where we keep the config-default.xml file template.