xornand / alfresco-business-reporting

Automatically exported from code.google.com/p/alfresco-business-reporting
0 stars 0 forks source link

Harvesting Speed Issue on a Large Alfresco Repository #43

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Alfresco Version: Community 4.2e
Alfresco Business Reporting Plug-In Version: 1.0.1.4
Platform: Windows Server 2008 R2
Database: PostgreSQL 9.2.4

We have an Alfresco repository with over 700,000 current versions of documents 
(ACAD Drawings, MS Office Documents, image files, PDF's, etc) with 6 custom 
model types and 4 aspects.  When we run the full harvesting on the repository 
for the first time we notice that the number of records written to the 
"document" table in the reporting database (hosted on the same instance of 
PostgreSQL as the Alfresco database) drops dramatically over time.

For example during the first hour of processing the reporting database has over 
35,000 records added to it.  By the time 100,000 records have been loaded into 
the reporting database the rate has dropped to around 20,000 records per hour.  
By the time 300,000 records have been loaded into the reporting database the 
rate has dropped to around 4,000 per hour.

You state in your wiki that it will take a long time to harvest a big data set. 
 I calculate that the harvesting of our data would take at best another 4.5 
days to complete if the rate doesn't drop from the 1 per second (which I doubt 
would be the case).

I am looking for some help to get round this issue as we have a real 
requirement for reporting on the content stored in Alfresco and really like the 
solution provided by the Alfresco Business Reporting plug-in.

Thanks and regards
Ewan Ritchie

Original issue reported on code.google.com by ewan.rit...@gmail.com on 18 Nov 2014 at 1:20

GoogleCodeExporter commented 8 years ago
Hi Ewan,

Thanks for your issue... At this point in time I am not exactly sure where the 
cause of this issue is located.
For one I know that I can implement batch pocessing of the queries into the 
reporting database. Hoewever, this will cost me quite some testing.
On the other side I am thinking if there are queries that are more effective 
when indexes are added to the reporting database. I have not analysed the query 
performance in that detail to figure out where the pain is, and what 
index/mitigtion helps performnce the best...

Open for suggestions... This month not too much time to actually work on this...

Original comment by tjarda.p...@incentro.com on 24 Nov 2014 at 10:24

GoogleCodeExporter commented 8 years ago
See the new 1.1.0.0 release. I think it will fix your issue!

Original comment by tjarda.p...@incentro.com on 15 Apr 2015 at 2:44