Closed weiweishi closed 8 years ago
I'd like to verify that we're using all of the tools we have at hand to analyze our memory usage in Production. In particular, that we have the GC logging enabled as detailed here: http://stackoverflow.com/questions/15307998/is-it-common-practice-to-turn-on-gc-logging-in-production-java-server. We should be able to gather this information with no performance loss in Production.
Important JVM flags we will want to ensure are being used in Production are: -XX:+PrintGCDetails,-XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -Xloggc:
Henry has indicated that he's seen some of these log files in Production, but they are empty. We should diagnose the issue preventing this important information from being logged ASAP
Looks like these are set in @henryzhang87's Ansible playbook:
./hydra/roles/fedora/templates/setenv.sh.j2:JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:HeapDumpPath=/usr/share/tomcat7/logs/fedoar_heap.hprof -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/tomcat7/gc.log"
The log is being populated! https://logger1.library.ualberta.ca/logs/fedora/mycombe/gc.log
huzzah 🎉
A couple reports created by the Eclipse Memory Analyzer
heapdump-5G-1472476259732_Leak_Suspects.zip heapdump-5G-1472476259732_Top_Components.zip
@mbarnett found this. It is likely responsible for the empty instances of java.util.LinkedHashMap and has been fixed for Fedora 4.6.
Just to summarize what's been observed so far:
Ways to work around:
Things to still investigate:
Based on the aboveand to avoid the buildup of heap memory usage, mycombe is schedule to reboot twice a week. It used to be once a week on Wednesday. It is scheduled to reboot at 5:00AM on Monday and Friday now. This will help to reduce the probability of sending heap usage warnings by Nagios to oncall device during the weekend.
On Wed, Aug 31, 2016 at 10:14 AM, pgwillia notifications@github.com wrote:
Just to summarize what's been observed so far:
- One servlet is taking up 97% of used memory -- makes sense, this is Fedora
- Many empty collections (java.util.LinkedHashMap) hanging around is a know problem and was fixed in Fedora 4.6
Ways to work around:
- restart tomcat on a regular basis (i.e. weekly)
- upgrade to Fedora 4.6
Things to still investigate:
- Duplicate Strings - 49,993 × org.fcrepo.http.commons.domain.SinglePrefer (104 bytes)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/1236#issuecomment-243816478, or mute the thread https://github.com/notifications/unsubscribe-auth/ADFBLHVu_EPTDayFqfUHefqrnQfx5mjMks5qlahdgaJpZM4Jki62 .
Henry Zhang Sr. Sys. admin and Storage Specialist
The memory leak fix is directly reflected in changes to org.fcrepo.http.commons.domain.SinglePrefer.
I think we can close this.
This is to analyze the heap memory usage and try to trace down where the memory leak might be. We will be implementing the twice a week restart on ERA to accommodate the September traffic hike.