prawnpdf / prawn

Fast, Nimble PDF Writer for Ruby
https://prawnpdf.org
Other
4.65k stars 687 forks source link

File corrupt after draw_text #584

Closed thanhquang1988 closed 10 years ago

thanhquang1988 commented 10 years ago

Here is my code, i'm using last version of prawn(1.0.0.rc2)

pdf = Prawn::Document.new(template: file,margin: [0,0,0,0])
pdf.text_box "test",at: [10,pdf.bounds.top - 100],size: 20,width: 200,height: 400
send_data pdf.render, filename: "test.pdf", type: 'application/pdf', disposition: 'inline' 

My file : http://www.mediafire.com/view/zr3rspp5eh9oesv/test.pdf after render http://d.pr/i/Y95s

I open error file with Hexedit

/Count 3 /Kids [5 0 R 7 0 R 9 0 R] /MediaBox [0 0 595.275591 841.889764] /Resources 11 0 R /Type /Pages

endobj 5 0 obj << /Contents [6 0 R 40 0 R] /MediaBox [0 0 612 792] /Parent 4 0 R /Type /Page /Resources << /Font << >>

endobj 6 0 obj << /DecodeParms [null] /Filter [/FlateDecode] /Length 5573

Thank for your help,

practicingruby commented 10 years ago

I didn't get the same output as you did, but I did get a messed up PDF. :cry:

Our templating support can sometimes have problems with certain kinds of PDFs, and we've been having some difficulty stabilizing it. I am considering removing it from Prawn entirely, moving it into its own gem. Follow up on that discussion on the mailing list.

I'm going to leave your ticket open until a decision is made. Sorry I couldn't be more helpful.

pointlessone commented 10 years ago

I'm not sure what I'm doing wrong. The test file renders just fine in Preview, Acrobat Pro, and Firefox.

thanhquang1988 commented 10 years ago

The "test.pdf" file is a input file.I am getting an error after render text on this file. I don't know how to tell you about this problem.I will try to find and repair it. thank you for helping me

thanhquang1988 commented 10 years ago

The cause of the problem is as follow.:

After add text & render file like:


/Resources 14 0 R
/Type /Pages
  >>
endobj
5 0 obj
  << /Contents [6 0 R 7 0 R 43 0 R]
/MediaBox [0 0 612 792]
/Parent 4 0 R
/Type /Page
/Resources << /Font << /F1.0 46 0 R
  >>
  >>
  >>
endobj

When i use hex edit to change it like :


/Resources 14 0 R
/Type /Pages
<< /Font << /F1.0 46 0 R >>. >>. >>.
endobj
5 0 obj
 << /Contents [6 0 R 7 0 R 43 0 R]
/MediaBox [0 0 612 792]
/Parent 4 0 R
/Type /Page
.>>
endobj

The file has been fixed, I am try to look for solutions but i can't find anything. .edited

pointlessone commented 10 years ago

@nhimbkno1 You really should use markup for code. Otherwise it gets garbled and it's hard to reason about it in your comments.

thanhquang1988 commented 10 years ago

@cheba I have tried to markup for code. Sorry but my solution is not work, after edit my pdf file can view only Chrome and Foxit reader. Here is two file before and after edit : http://www.mediafire.com/view/4qf2o48bl43psu5/error_before.pdf http://www.mediafire.com/view/bz4dkeb99ezbe01/error_repair.pdf

PDF.js give some errors : `` Error: Invalid XRef stream Error: Dictionary key must be a name object

pointlessone commented 10 years ago

@nhimbkno1 Indeed, the output file is broken and the template is (mostly) fine. I'll look into this later today.

Thank you for reporting the issue and uploading the files.

thanhquang1988 commented 10 years ago

Who can tell me how to solve the problem. :cry:

practicingruby commented 10 years ago

@nhimbkno1: Unfortunately, we don't have a good answer to that. See this mailing list conversation -- in short we're considering dropping support for templates due to their lack of stability.

@cheba is one of the people who has volunteered to figure out a better path forward, but I'm unsure if he's planning to investigate this particular issue.

practicingruby commented 10 years ago

Closing this ticket temporarily, because no one on core is presently looking into template issues. Pull requests are welcome though, and I've tagged the ticket with "templates" to make it easy to find for those volunteers who are trying to fix template problems.

pointlessone commented 10 years ago

@nhimbkno1 Sorry, was short on time lately. I'm planning to investigate it tomorrow but I won't give you false hope. It's likely there's no simple fix/workaround for your issue.

pointlessone commented 10 years ago

@nhimbkno1 I'm sorry, but I can't reproduce the problem. I tried rbx-2.2.0, MRI 2.0.0 and MRI 1.9.3 with both Prawn 1.0.0.rc2 and current master. Every single time I got good output.

Script:

#!/usr/buin/env ruby
require 'prawn'

pdf = Prawn::Document.new(template: 'error_before.pdf', margin: [0, 0, 0, 0])
pdf.text_box "test", at: [10, pdf.bounds.top - 100], size: 20, width: 200, height: 400
File.write 'test.pdf', pdf.render

Could you you please provide a simple script that would reproduce the problem? Also tell a bit about your environment.

@sandal Can you please tell what exactly was messed up in the PDF you've got? Or better upload it somewhere?

thanhquang1988 commented 10 years ago

@cheba The file "error_before.pdf" is my output file. and this file has font error. can you tried again with this file : http://www.mediafire.com/view/vkyboskyywklvc0/12.pdf

pointlessone commented 10 years ago

@nhimbkno1 Still works.

Please try the script I've posted earlier on your machine and see if it works for you.

thanhquang1988 commented 10 years ago

I tried your script and give that error : at line File.write 'test.pdf', pdf.render Encoding::UndefinedConversionError at /testpdf "\xFF" from ASCII-8BIT to UTF-8 And backtrace here::

Encoding::UndefinedConversionError - "\xFF" from ASCII-8BIT to UTF-8:
  app/controllers/pdf_controller.rb:23:in `index'   => File.write 'test.pdf', pdf.render
  actionpack (3.2.13) lib/action_controller/metal/implicit_render.rb:4:in `send_
action'
  actionpack (3.2.13) lib/abstract_controller/base.rb:167:in `process_action'
  actionpack (3.2.13) lib/action_controller/metal/rendering.rb:10:in `process_ac
tion'
  actionpack (3.2.13) lib/abstract_controller/callbacks.rb:18:in `block in proce
ss_action'
  activesupport (3.2.13) lib/active_support/callbacks.rb:403:in `_run__438628553
__process_action__782658454__callbacks'
  activesupport (3.2.13) lib/active_support/callbacks.rb:405:in `__run_callback'

  activesupport (3.2.13) lib/active_support/callbacks.rb:385:in `_run_process_ac
tion_callbacks'
pointlessone commented 10 years ago

@nhimbkno1 What version of Ruby do you use?

thanhquang1988 commented 10 years ago

i'm using ruby 1.9.3 and rails 3.2.13. I find that line in application.rb:: config.encoding = "utf-8" i'll try to comment that line and restart my app.

thanhquang1988 commented 10 years ago

Still error about ASCII to UTF-8

Do you checking with this script


    filename = "#{Prawn::DATADIR}/pdfs/12.pdf"
    pdf = Prawn::Document.new(:template => filename) 
    pdf.go_to_page(1)
    pdf.text_box "Test",at: [10,pdf.bounds.top - 100],size: 13,width: 500,height: 600
    send_data pdf.render, filename: "test.pdf", type: 'application/pdf', disposition: 'inline'   
thanhquang1988 commented 10 years ago

@cheba Are you using gem to convert ASCII-8BIT to UTF-8

pointlessone commented 10 years ago

@nhimbkno1 I tried the one in the original report. This one works too. Ruby 1.9.3, Rails 3.2.13, Prawn 1.0.0.rc2. Rails app uses UTF-8 as default encoding. I don't use any gems for encoding conversions. pdf.render produces a string with ASCII-8BIT encoding and I'm pretty sure that in this case it's a Rails problem.

Could you please create a new sample Rails app to help me reproduce the problem? Push it to Github so that I could clone it.

practicingruby commented 10 years ago

@nhimbkno1 You're writing a binary file using File.write rather than File.binwrite so what's probably happening is that it's trying to treat the PDF output as UTF-8 text, which will cause encoding errors. Try using File.binwrite instead.

pointlessone commented 10 years ago

@sandal That was my fault. The original code uses send_data.

thanhquang1988 commented 10 years ago

I have just created new app and using prawn gem 1.0.0.rc2. rails 4.0 and ruby 2.0. Script:

def index
    #!/usr/bin/env ruby
    require 'prawn'
    pdf = Prawn::Document.new(template: '12.pdf', margin: [0, 0, 0, 0])
    pdf.text_box "test", at: [10, pdf.bounds.top - 100], size: 20, width: 200, height: 400
    send_data pdf.render, filename: "test.pdf", type: 'application/pdf', disposition: 'inline'   
  end

and here is the result file :cry:

http://www.mediafire.com/view/zr3rspp5eh9oesv/test.pdf

my repo : https://github.com/nhimbkno1/testprawn
pointlessone commented 10 years ago

OK. I've got your code.

No encoding errors.

Generated file isn't rendered correctly in Firefox but Preview seem to be more tolerant.

screen shot 2013-11-26 at 5 23 06 pm

It looks like PDF.js (used in Firefox for rendering PDF files) doesn't like multiple content streams. I'm looking into it.

thanhquang1988 commented 10 years ago

That's strange.Why am I receiving an error but you not? :tired_face: error

@cheba can you clone and using my code.

pointlessone commented 10 years ago

@nhimbkno1 I'm using your code.

What exactly is the error you get? Is it an error message in your PDF reader or an exception in your application?

It looks like you don't have any errors in your code (no exceptions and backtraces) but rather you're not satisfied with the look of your document in PDF reader. BTW, what do you use to view PDFs?

practicingruby commented 10 years ago

@cheba: There is clearly a rendering issue in that screenshot, when compared to the source template. This means either that Prawn is corrupting the PDF or that the viewer has a bug.

It's not a matter of just not being happy with how it looks, it's that it breaks the original formatting.

thanhquang1988 commented 10 years ago

No exceptions and backtracts but pdf file is not like template. I user PDF.js to view on my pages using canvas.

pointlessone commented 10 years ago

PDF.js is far from complete. It can not render the original template file too. At least on Nightly (28.0a1 (2013-11-24)) it fails to render first page of unmodified template. The other two pages look fine.

The template itself is not the example of a top notch PDF file, either. I will investigate why PDF.js doesn't like the template itself. There's clearly an issue with fonts. Though, I'm not sure why only first page actually fails to be rendered. Other pages seem to look fine on my machine.

http---lvh me-3000-2 http---lvh me-3000-3

thanhquang1988 commented 10 years ago

sure @cheba because we only draw_text on page 1, if you draw_text on all pages then this file will be Corrupt file.

practicingruby commented 10 years ago

@nhimbkno1: Can you give us the link again to the original template before Prawn has rendered anything on it? I just want to make sure we're using the right input file.

thanhquang1988 commented 10 years ago

Yes here is an folder include many templates, all of them give Views error after render text

https://www.mediafire.com/folder/05qhs5depwwp4/Error%20Templates

practicingruby commented 10 years ago

The first of those rendered fine for me in PDF.js on Firefox 25.0, did not try the others. I can confirm that your post-template version does not run correctly there, though.

@cheba We need to determine whether this is a corrupted PDF or whether its a bug in PDF.js. It doesn't seem to me like the Firefox nightly builds are clean testing environment to figure that out in. I also saw rendering errors using Ubuntu's document viewer, so I think there may be at least some edge case we're hitting here.

thanhquang1988 commented 10 years ago

It may be caused by your gem is difficultly my gem.Thank you for helping me, though. In my case,when i've saved the output file and open by orther pdf reader software, all give error.(Nitro pdf, Adobe reader,Foxit reader). :zzz:

pointlessone commented 10 years ago

@nhimbkno1 I use 12.pdf from your sample rails app.

Thanks for the link with other templates.

practicingruby commented 10 years ago

I've confirmed the same results in Adobe Reader... original PDF looks fine, template PDF looks corrupted in the screenshot @nhimbkno1 posted.

@cheba: Next step would probably be to look at the graphics states and font operations going on in the PDF and see if we're messing them up anywhere. It's been far too long since I've manually inspected a PDF to give much guidance, but that's where I'd start.

pointlessone commented 10 years ago

@sandal I've found the problem.

Template (12.pdf) has a global Resources dictionary with all fonts. It's set on the Page Tree Node. Pages have no Resource dictionaries and so inherit global resources. Once Prawn add some text it adds new font to the page Resources dictionary and thus overrides the global one. After that renderer can not find fonts used on the page.

@nhimbkno1 While technically your template is correct (as far as I can tell), Prawn can not handle it correctly. The "easiest" workaround I see right now is to write a script to recreate the template with Prawn. Sorry, but this can not be fixed in Prawn in a timely manner.

luongtran commented 10 years ago

@cheba In this issue, we should write a script to convert template with Prawn - i agree with you. But i just confused with your suggestion. We should add the font define when Prawn add some text which add new fonts? Or we should use default font when it's added?

pointlessone commented 10 years ago

@luongtran I suggest not converting (taking existing files and modifying them) but rather recreating your templates with Prawn. That means writing scripts that generate your templates only using Prawn.

PDF is a complex format. Currently Prawn doesn't implement the spec completely. For example, in this case Prawn has no idea that Resources can be inherited. It just sees that the page have no resources dictionary and create one when it needs one. Prawn has no clue that it should check first if there are any inherited resources.

luongtran commented 10 years ago

@cheba I think your suggestion is not realistic for the large application. It mean all users have to use our tool to create the pdf template? It will take more time and you cannot gain customer due to some app resolve this problem very well. Now, i think the problem at the Prawn is when it see the new font that not define in template it will create new resource at the top page, right? So why we dont define the resource at every pages? In this case i think it will resolve problem

luongtran commented 10 years ago

@cheba All template files that we're facing issue which have backward definition. The last node is defined at the top file and the first node is defined at the bottom. Does it problem with Prawn?

pointlessone commented 10 years ago

Here's a diagram of the PDF after it's been processed by Prawn. Specifically, 12.pdf template after adding some text to the first page.

pdf-12-diagram

Now, i think the problem at the Prawn is when it see the new font that not define in template it will create new resource at the top page, right?

This is exactly what happens. But, it also adds a Resources dictionary to the page because there's none in the template. Prawn assumes that page needs one at all times and it doesn't know that resources can be inherited from the Page Tree Node as in the case of 12.pdf file.

So why we dont define the resource at every pages?

In this case it's exactly the opposite. It will break every single page because it will override inherited resources. The correct way to do it would be to duplicate Page Tree Node resources into every page but Prawn is not capable of doing that automatically. You can write some code to do that for this specific case but something universally useful and robust would be extremely hard to build if not outright impossible.

The last node is defined at the top file and the first node is defined at the bottom. Does it problem with Prawn?

No, it's not a problem. PDF is basically is a big tree structure (a graph actually) and Prawn is capable of working with it on the lowest level.

luongtran commented 10 years ago

Seem it's difficult for me to use this gem. Do you know any other pdf gem that can be resolve this problem and other feature like Prawn?

pointlessone commented 10 years ago

There are a few command line tools that might be useful for you. The more popular ones have Ruby wrappers. For example PDFKit. Though, I'm not sure if any of them would be a solution for your specific case.

luongtran commented 10 years ago

@cheba Can i call Prawn to get font from Page tree node resource of pdf template? So when i insert or draw any text i can define the font that existed in pdf template?

pointlessone commented 10 years ago

@luongtran Technically, once Prawn loads the template document you should have access to all objects in the PDF. You should be able to access and manipulate them directly.

Though, keep in mind that Prawn will create Resources dictionary on pages with the first attempt to add text to that page and it doesn't depend on what font you use for that operation. So the way to go for you in this situation would be first manually copy Resources dictionary from Page Tree Node to every page you want to modify and only then add the text.

luongtran commented 10 years ago

@cheba I already fixed this case. My solution: when load template by Prawn, we check if it has pages.data[:Resources] we store it and remove it in pages.data Then when populate page we check if page has no definition about Resources we add it.

pointlessone commented 10 years ago

@luongtran That is the best you can do right now.

luongtran commented 10 years ago

Yes, i think so. Thanks for your help @cheba

thanhquang1988 commented 10 years ago

@sandal @cheba @luongtran Thanks for everyone.