sungaila / PDFtoImage

A .NET library to render PDF files into images.
https://www.sungaila.de/PDFtoImage/
MIT License
163 stars 16 forks source link

Page batching - Is it supported? #91

Closed ATimmeh33 closed 1 month ago

ATimmeh33 commented 1 month ago

Question

While navigating the source code, I couldn't find native support for batching pages (e.g., GetImages(from: 1, to: 5)). Is this feature already implemented, and I missed it, or is it not available yet?

I see that GetImage has a page property, which could be used to manually implement batch logic. However, this would introduce additional overhead and performance issues.

If this functionality is not currently present, I am considering forking the project to add it. Please let me know if this feature is planned or desired, and I will create a PR.

sungaila commented 1 month ago

Hi @ATimmeh33,

thanks for opening this issue. You are correct, there is no batch support yet. I would add two new overloads to ToImages:

Feel free to make a pull request. But if you can wait a few more days, I would do an implementation myself.

ATimmeh33 commented 1 month ago

Hi @ATimmeh33,

thanks for opening this issue. You are correct, there is no batch support yet. I would add two new overloads to ToImages:

* `pageStart` and `pageEnd` for ranges (with null for meaning first or last page)

* an int array with page numbers (which will be ordered and made distinct internally).

Feel free to make a pull request. But if you can wait a few more days, I would do an implementation myself.

Thanks for the quick reply. Could think about using https://learn.microsoft.com/en-us/dotnet/api/system.range.

Are you planning to extend other methods too? For example in my scenario I had an input stream and needed a PNG output streams, so I ended up using method signature SavePng(Stream imageStream, Stream pdfStream, bool leaveOpen, string? password, int page, RenderOptions options).

To have that support a range would require it resulting in a collection of streams, not sure if that's something you'd want.

Either way my direct issue is resolved with this library, so any additions will be a bonus. I'll check around for any new implementations!

sungaila commented 1 month ago

Hey @ATimmeh33,

please give this preview a try: PDFtoImage.4.1.0-debug.zip PDFtoImage.4.1.0-debug.zip

Index for single page conversion

// convert the last page
PDFtoImage.Conversion.SavePng("picture.png", inputStream, ^1, options: new(Dpi: 40));

Range for multiple page conversion

// convert all pages
PDFtoImage.Conversion.ToImages(inputStream, .., options: new(Dpi: 40));

// convert all except for the first 2 and last 2 pages
PDFtoImage.Conversion.ToImages(inputStream, 2..^3, options: new(Dpi: 40));

IEnumerable for multiple page conversion

// convert pages 1, 2 and 5
PDFtoImage.Conversion.ToImages(inputStream, [1, 2, 5], options: new(Dpi: 40));

I don't have any plans to extend SavePng, SaveJpeg and SaveWebp with multiple pages. You would have to write that yourself, something like this:

// get every page except for the last 2
var myStreams = GetPngAsStreams(inputStream, ..^3);

private static IEnumerable<Stream> GetPngAsStreams(Stream pdfStream, Range range)
{
    foreach (var bitmap in ToImages(pdfStream, range))
    {
        yield return bitmap.Encode(SKEncodedImageFormat.Png, 100).AsStream();
    }
}
ATimmeh33 commented 1 month ago

Heya @sungaila,

I tried it locally and loving the Range support, it would make our code a lot more concise.

I also attempted to write a small benchmark (using https://github.com/dotnet/BenchmarkDotNet), for the curiosity of how much overhead this was saving vs manual looping 1..n. Sadly I was running into some pdfium exceptions there.

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.DllNotFoundException: Unable to load shared library 'pdfium' or one of its dependencies.

I wouldn't be surprised if it has to do with this being a preview, so I'll try again once this is officially released, so I can also test in Configuration release.

Thanks for the speedy responses & excellent work!

sungaila commented 1 month ago

PDFtoImage 4.1.0 has been released. Thank you for opening this issue!

Sadly I was running into some pdfium exceptions there.

Might be a restore that didn't work since the NuGet package from earlier contains no dependencies (like the pdfium binaries).

ATimmeh33 commented 1 month ago

Nice, 4.1.0 working for me!

image

Thanks again, will close this issue then.