smartinmedia / Net-Core-DocX-HTML-To-PDF-Converter

.NET Core library to create custom reports based on Word docx or HTML documents and convert to PDF
MIT License
322 stars 76 forks source link

Multithreaded support #4

Open dylanvdmerwe opened 4 years ago

dylanvdmerwe commented 4 years ago

Describe the bug I wanted to inquire if there had been any tests around multi threaded safety with using this library?

It is obviously starting a process for LibreOffice, but this begs to raise the question: what would happen if multiple Convert from DOCX to PDF operations had to happen in a server environment at the same time?

image

            //Supposedly, only one instance of Libre Office can be run simultaneously
            while (pname.Length > 0)
            {
                Thread.Sleep(5000);
            }

I think this needs to be properly semaphored if only one LibreOffice can run at a time.

I tried with the following modifications to the example provided:

using System.Collections.Generic;
using System.IO;
using System.Reflection;
using System.Threading.Tasks;
using DocXToPdfConverter;
using DocXToPdfConverter.DocXToPdfHandlers;

namespace ExampleApplication
{
    class Program
    {
        private static readonly string locationOfLibreOfficeSoffice =
                @"C:\temp\LibreOfficePortable\App\libreoffice\program\soffice.exe";

        private static string executableLocation;
        private static string htmlLocation;
        private static string docxLocation;

        static async Task Main(string[] args)
        {
            //This is only to get this example to work (find the word docx and the html file, which were
            //shipped with this).
            executableLocation = Path.GetDirectoryName(
                Assembly.GetExecutingAssembly().Location);

            //Here are the 2 test files as input. They contain placeholders
            docxLocation = Path.Combine(executableLocation, "Test-Template.docx");
            htmlLocation = Path.Combine(executableLocation, "Test-HTML-page.html");

            List<Task> tasks = new List<Task>();
            for (int i = 0; i < 100; i++)
            {
                tasks.Add(Task.Run(() => docxToHtml(i)));
            }
            await Task.WhenAll(tasks);
        }

        private static void docxToHtml(int number)
        {
            //Prepare texts, which you want to insert into the custom fields in the template (remember
            //to use start and stop tags.
            //NOTE that line breaks can be inserted as what you define them in ReplacementDictionaries.NewLineTag (here we use <br/>).

            var placeholders = new Placeholders();
            placeholders.NewLineTag = "<br/>";
            placeholders.TextPlaceholderStartTag = "##";
            placeholders.TextPlaceholderEndTag = "##";
            placeholders.TablePlaceholderStartTag = "==";
            placeholders.TablePlaceholderEndTag = "==";
            placeholders.ImagePlaceholderStartTag = "++";
            placeholders.ImagePlaceholderEndTag = "++";

            //You should be able to also use other OpenXML tags in your strings
            placeholders.TextPlaceholders = new Dictionary<string, string>
            {
                {"Name", "Mr. Miller" },
                {"Street", "89 Brook St" },
                {"City", "Brookline MA 02115<br/>USA" },
                {"InvoiceNo", "5" },
                {"Total", "U$ 4,500" },
                {"Date", "28 Jul 2019" }
            };

            //Table ROW replacements are a little bit more complicated: With them you can
            //fill out only one table row in a table and it will add as many rows as you 
            //need, depending on the string Array.
            placeholders.TablePlaceholders = new List<Dictionary<string, string[]>>
            {

                    new Dictionary<string, string[]>()
                    {
                        {"Name", new string[]{ "Homer Simpson", "Mr. Burns", "Mr. Smithers" }},
                        {"Department", new string[]{ "Power Plant", "Administration", "Administration" }},
                        {"Responsibility", new string[]{ "Oversight", "CEO", "Assistant" }},
                        {"Telephone number", new string[]{ "888-234-2353", "888-295-8383", "888-848-2803" }}
                    },
                    new Dictionary<string, string[]>()
                    {
                        {"Qty", new string[]{ "2", "5", "7" }},
                        {"Product", new string[]{ "Software development", "Customization", "Travel expenses" }},
                        {"Price", new string[]{ "U$ 2,000", "U$ 1,000", "U$ 1,500" }},
                    }

            };

            //You have to add the images as a memory stream to the Dictionary! Place a key (placeholder) into the docx template.
            //There is a method to read files as memory streams (GetFileAsMemoryStream)
            //We already did that with <+++>ProductImage<+++>

            var productImage =
                StreamHandler.GetFileAsMemoryStream(Path.Combine(executableLocation, "ProductImage.jpg"));

            var qrImage =
                StreamHandler.GetFileAsMemoryStream(Path.Combine(executableLocation, "QRCode.PNG"));

            var productImageElement = new ImageElement() { Dpi = 96, memStream = productImage };
            var qrImageElement = new ImageElement() { Dpi = 300, memStream = qrImage };

            placeholders.ImagePlaceholders = new Dictionary<string, ImageElement>
            {
                {"QRCode", qrImageElement },
                {"ProductImage", productImageElement }
            };

            /*
             *
             *
             * Execution of conversion tests
             *
             *
             */

            //Most important: give the full path to the soffice.exe file including soffice.exe.
            //Don't know how that would be named on Linux...
            var test = new ReportGenerator(locationOfLibreOfficeSoffice);

            ////Convert from HTML to HTML
            //test.Convert(htmlLocation, Path.Combine(Path.GetDirectoryName(htmlLocation), "Test-HTML-page-out.html"), placeholders);

            ////Convert from HTML to PDF
            //test.Convert(htmlLocation, Path.Combine(Path.GetDirectoryName(htmlLocation), "Test-HTML-page-out.pdf"), placeholders);

            ////Convert from HTML to DOCX
            //test.Convert(htmlLocation, Path.Combine(Path.GetDirectoryName(htmlLocation), "Test-HTML-page-out.docx"), placeholders);

            ////Convert from DOCX to DOCX
            //test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(htmlLocation), "Test-Template-out.docx"), placeholders);

            ////Convert from DOCX to HTML
            //test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(htmlLocation), "Test-Template-out.html"), placeholders);

            //Convert from DOCX to PDF
            test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(htmlLocation), $"Test-Template-out-{number}.pdf"), placeholders);
        }
    }
}
dylanvdmerwe commented 4 years ago

A rudimentary, but works, way to handle this would be to lock the Convert method in ConvertWithLibreOffice.

   private static readonly object processLock = new object();

        public static void Convert(string inputFile, string outputFile, string libreOfficePath)
        {
            lock (processLock)
            {
                List<string> commandArgs = new List<string>();
                string convertedFile = "";

                if (libreOfficePath == "")
                {
                    libreOfficePath = GetLibreOfficePath();
                }

and remove this code:

                //Supposedly, only one instance of Libre Office can be run simultaneously
                while (pname.Length > 0)
                {
                    Thread.Sleep(5000);
                }
dylanvdmerwe commented 4 years ago

Note that this will fix it for one process. But if you have multiple processes using this library (i.e. on a server) and both need to use libreoffice to generate files, you will have errors as it does not seem multiple libreoffices can be started at the same time.

dylanvdmerwe commented 4 years ago

Are we using the right LibreOffice params? Surely there's a way to run multiple conversions (LibreOffice processes) at the same time?

martinweihrauch commented 4 years ago

Dang - that is a very valid point, which we have not yet considered!! If you have any time to check this out, it would be great. Then you could make a pull request and I could integrate it!! Best Martin

gofal commented 4 years ago

of course this line of code

            //Supposedly, only one instance of Libre Office can be run simultaneously
            while (pname.Length > 0)
            {
                Thread.Sleep(5000);
            }

is a bug. After waiting for 5 seconds the array pname still has the same content, even if meanwhile the process of soffice.exe has exitet. you have to requery again every 5 seconds:

     //Supposedly, only one instance of Libre Office can be run simultaneously
     while (pname.Length > 0)
     {
        Thread.Sleep(5000);
        pname = Process.GetProcessesByName("soffice");
     }