tsgrp / OpenContent

TSG's Web Services for ECM Repositories
8 stars 4 forks source link

EnhancedObjectContent Uses DataHandler - Large Memory Overhead #23

Open mbowen000 opened 10 years ago

mbowen000 commented 10 years ago

The content property on EnhancedObjectContent is of type DataHandler - we set it like so with Alfresco:

result.setContent( new DataHandler(new ByteArrayDataSource(nodeReader.getContentInputStream(), nodeReader.getMimetype())));

and with dctm:

content.setContent( new DataHandler(new ByteArrayDataSource(contentStream, mimeType) ) );

Creating a DataHandler that wraps a ByteArrayDataSource is storing the entire contents in memory. 500MB File -> 500MB of memory usage until this code is done executing. Any scalability is removed for large files / large users bases.

Instead, we should be utilizing some kind of InputStream to store what is essentially a pointer to the content in the repository. Then buffering can be used to avoid this huge memory overhead. It has been noted that we use this property to set and retrieve content many times throughout OC - so this would be a refactoring effort if we don't want to do any kind of deprecation on the EnhancedObjectContent or similar classes.

benallenallen commented 10 years ago

From the DCTM perspective DFC only exposes methods to return a ByteArrayInputStream, BUT we might have some luck if we use ACS to stream the contents rather than the DFC:

private List<String> getAcsURLs(String objId) throws DfException
{
IDfSession session = getDfSession();
IDfSysObject obj = (IDfSysObject) session.getObject(new DfId(objId));
IDfEnumeration acsRequests = obj.getAcsRequests("crtext",0,null,constructPreferences());
List<String> URLs = new ArrayList<String>();
while (acsRequests.hasMoreElements())
     {
        IDfAcsRequest acsRequest = (IDfAcsRequest) acsRequests.nextElement();
        String acsUrl = acsRequest.makeURL();
         URLs.add(acsUrl);
        System.out.println("ACS URL for object " +  objId +acsurl );
     }
return URLs;
}