wieseljonas / java-libpst

Automatically exported from code.google.com/p/java-libpst
1 stars 1 forks source link

Unable to get full body contents #49

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Parsing a PST file.
2. Using getBody() method of PSTMessage object to retrieve the body contents.
3.

What is the expected output? What do you see instead?
The method call returns null if the contents of the email body contains more 
than 255 characters. However, using getBodyPrefix() on the same PSTMessage 
object returns the first 255 characters of the mail body.

What version of the product are you using? On what operating system?
I'm using version 0.7 of libpst jar. The operating system used is Windows XP.

Please provide any additional information below.
I'm able to retrieve the subject, sender and other information successfully. 
I'm not able to get the body contents only.

Original issue reported on code.google.com by ankuragr...@gmail.com on 11 Aug 2011 at 9:11

GoogleCodeExporter commented 9 years ago
Here is the code where i am facing problem:

import com.pff.*;
import java.util.*;

public class Test {
        public static void main(String[] args)
        {
                new Test("C:\\outlook.ost");
        }

        public Test(String filename) {
                try {
                        PSTFile pstFile = new PSTFile(filename);
                        System.out.println(pstFile.getMessageStore().getDisplayName());
                        processFolder(pstFile.getRootFolder());
                } catch (Exception err) {
                        err.printStackTrace();
                }
        }

        int depth = -1;
        int c = 0;
        public void processFolder(PSTFolder folder)
                        throws PSTException, java.io.IOException
        {
                depth++;
                // the root folder doesn't have a display name
                if (depth > 0) {
                        printDepth();
                        System.out.println(folder.getDisplayName());
                }

                // go through the folders...
                if (folder.hasSubfolders()) {
                        Vector<PSTFolder> childFolders = folder.getSubFolders();
                        for (PSTFolder childFolder : childFolders) {
                                processFolder(childFolder);
                        }
                }

                // and now the emails for this folder
                if (folder.getContentCount() > 0) {
                        depth++;
                        PSTMessage email = (PSTMessage)folder.getNextChild();
                        while (email != null) {
                                printDepth();
                                System.out.println("Date: " + email.getCreationTime());
                                System.out.println("Email: " + email.getSubject());
                                System.out.println("Body length: " + email.getBodyPrefix().length() + "  " + email.getBody().length() + "  " + email.getBodyHTML().length());
                                System.out.println("Body : " + email.getBody());
                                email = (PSTMessage)folder.getNextChild();
                        }
                        depth--;
                }
                depth--;
        }

        public void printDepth() {
                for (int x = 0; x < depth-1; x++) {
                        System.out.print(" | ");
                }
                System.out.print(" |- ");
        }
}

Original comment by ankuragr...@gmail.com on 11 Aug 2011 at 10:08

GoogleCodeExporter commented 9 years ago
I am not sure this problem relates to message length. Running a simple test 
through my Inbox, I successfully read the body of messages with thousands of 
characters. At the same time, I see some messages where the getBody() method 
returns an empty string (not null). So I would guess that the logic to append 
content to the buffer ends prematurely.

Original comment by blackda...@aol.com on 3 Nov 2011 at 6:24

GoogleCodeExporter commented 9 years ago
Some further information: Some of the files that return a zero-length body 
actually have content returned from getBodyHTML(). So I use a routine which 
checks for the html, and extracts the content. Here's a rough algorithm to yank 
the text out of the html mail (feel free to adapt into the PST library):

    private String convertHtml(String bodyHtml) {
        StringBuffer buf = new StringBuffer();
        int spanInx = bodyHtml.indexOf("<span");
        while (spanInx > -1) {
            int end = bodyHtml.indexOf(">", spanInx);
            int nxtTag = bodyHtml.indexOf("<", end);
            String data = bodyHtml.substring(end + 1, nxtTag);
            spanInx = bodyHtml.indexOf("<span", end);
            int paraTag = bodyHtml.indexOf("</p>", end);
            if (paraTag > 0 && paraTag < spanInx) {
                buf.append("\n");
            } else {
                buf.append(data + " ");
            }
        }
        return buf.toString();
    }

Original comment by blackda...@aol.com on 4 Nov 2011 at 7:17