mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.34k stars 9.97k forks source link

Interactive form (AcroForm) support #7613

Closed timvandermeij closed 4 years ago

timvandermeij commented 8 years ago

This is a tracking issue only, so this is not the place for any other questions or discussions. Open a new issue for that.

This is a meta issue for interactive form (AcroForm) support according to Chapter 12.7 of the PDF reference (http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2110737). This includes all form elements except for signature fields, which are tracked in #1076. The objective is to get https://github.com/mozilla/pdf.js/blob/master/test/pdfs/f1040.pdf.link to render completely, but also to resolve other open issues and PRs.

General

Text widgets

Choice widgets

Button widgets

Snuffleupagus commented 8 years ago

This is a meta issue for tracking interactive form (AcroForm) support according to Chapter 8.6 of the PDF reference (https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf#page=671&zoom=auto,-246,244).

It might be a good idea to instead base the work on the latest version of the PDF specification, just in case there are any differences: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2110737.

Also, perhaps a good idea to add a "General" TODO item about ensuring proper test-coverage?

timvandermeij commented 8 years ago

Both items have been addressed. Thank you!

Snuffleupagus commented 8 years ago

I think that we're also going the have to actually parse the contents of the AcroForm dictionary, since otherwise we're not able to e.g. load all the necessary font resources. Obviously, we cannot use custom fonts in the display layer, but we should be able to at least infer the correct font-family (and things like e.g. bold/italic) that should be used and pass that info on to the display layer.

Also, for printing forms, we might be able to utilize (or build upon) the already existing appendToOperatorList functionality, but that will definitely require that font resources present in the AcroForm dictionary has been loaded.

Another thing that we probably should attempt to support, is using the correct text colour in the display layer (note how in Adobe Reader the text in the form fields of f1040.pdf is blue). This probably ties in to better and more complete Appearance stream support.

Finally, a general question: Will we actually be able to support forms in a meaningful way, without partial (and well sanitized) script support?

timvandermeij commented 8 years ago

Good points. I just added them to the item list above. I don't think we really need script support as the AcroForms generally just require filling and printing. AFAIK scripts are only used for interaction between elements, but we can implement the most used functionality ourselves (such as resetting the form or button actions for printing it). We'll have to see how widely used such script functionality is.

Snuffleupagus commented 8 years ago

Handle flags: multiline and read-only

There's other flags that we might need to try and support as well, one example is comb which controls the spacing between the characters in an input field. That one is actually used on the second page of f1040.pdf, see the "Personal identification number (PIN)" field.

timvandermeij commented 8 years ago

Sounds like a good idea. I have added it to the list.

Snuffleupagus commented 8 years ago

It would probably also be a good idea see if the WidgetAnnotation code that builds the fullName property can be cleaned up or improved upon, see https://github.com/mozilla/pdf.js/blob/6c263c19946af23b723f148d9f05118971e18b36/src/core/annotation.js#L640-L670.

Also, regarding WidgetAnnotations it seems that different types can have different requirements for the V entry in the annotation dictionary, so it might be better to fetch and validate data.fieldValue in each specific WidgetAnnotation subclass.

timvandermeij commented 8 years ago

The first point is now in the list, for which I've got some ideas. I found out about the second point in a patch I'm currently finalizing for choice widget annotations, so that will be addressed there.

lexcorp commented 8 years ago

Hey @timvandermeij When this functionality will be available? How I can help?

timvandermeij commented 8 years ago

We're currently in the process of implementing this, but it's a large piece of functionality that will take time before it's complete. The ticked boxes above show which elements are already implemented and for other boxes there are already work-in-progress pull requests, so we're on track with this functionality. Feel free to test it by using the master branch and setting the renderInteractiveForms parameter to true. It's disabled by default as it's not ready yet.

lexcorp commented 8 years ago

Thank you tim, what can you tell me about digital signatures? There is progress according to this discussion thread https://github.com/mozilla/pdf.js/issues/1076

This was reported by the user: soa-x opened this issue on 13 Jan 2012

Almost 5 years have passed since it was reported.

Even someone has already done much of the implementation

viveksjain commented on February 22 @complience Hi, I have a proof-of-concept working at https://github.com/viveksjain/pdf.js/tree/sig-verify-support. You can try it by using git clone --recursive https://github.com/viveksjain/pdf.js.git. With a little bit more work it Should be ready for a pull request into esta repo, but I just Have not Had the time yet.

Do you know if these jobs were added to recent versions of pdf.js?

Snuffleupagus commented 8 years ago

Re: https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251692825

Signatures in PDF files is a big and complex topic, one which is somewhat orthogonal to implementation of basic AcroForm support (which is what this particular issue is tracking).

The current issue is just a tracking issue for implementation of basic AcroForm features, signatures are already tracked elsewhere (in #1076, which is where that feature should be discussed).

@lexcorp Please refrain from posting unrelated information and/or asking questions here, since it detracts from the purpose of this issue (which is to track support for basic AcroForm features). Also, you've now posted basically the same information in three different issues, please do not spam the issue tracker in this way!

anujgeek commented 8 years ago

Hello @timvandermeij @Snuffleupagus, We really like your solution for adding support for AcroForm fields. We're planning to use these features in an app we're currently developing. We'd really appreciate if you can provide us a tentative date where you'd be able to add support for all types of form fields like checkboxes, etc. and export the filled data into an XFDF file or any other format. Thanks.

Snuffleupagus commented 8 years ago

@anujgeek As I've already mentioned in https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251699579, this is a tracking issue and not really a good place for this kind of general discussion and/or asking questions!

There's a number of fairly difficult TODOs left to implement, see the possibly incomplete list above, hence it's not possible to give any sort of estimate of when, or even if, this feature will be completely implemented.

Also, note that so far all work has been done by contributors, and given that Mozilla is replacing PDF.js in Firefox (see https://wiki.mozilla.org/Mortar_Project) forms support will most likely take a while to complete.

timvandermeij commented 7 years ago

This is a tracking issue (refer to https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251895091), so this is not the place for discussion or questions. Contact us on IRC in case of questions or file a separate issue if you found a bug. Thanks.

(I'm unlocking the conversation to be able to let users use the reaction button to measure the interest for this feature, but irrelevant comments will be removed.)

Alex-DE-74 commented 6 years ago

Hello together!

What is the progress with AcroForm fill? Used example https://www.irs.gov/pub/irs-pdf/f1040.pdf (and other) still does not work. Or is it not configured by default? Some basic JavaScript like set field(s), clear field(s), send button support mentioned?

Thanks.

Snuffleupagus commented 6 years ago

@Alex-DE-74 Please read through the above comments carefully, in particular https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251895091 and https://github.com/mozilla/pdf.js/issues/7613#issuecomment-287907674 are relevant. Furthermore, you've already asked these questions in #9261 (where answers were provided); please let's try and keep this tracking issue free from that kind of general discussion.

Alex-DE-74 commented 6 years ago

@Snuffleupagus

Excuse me, but for me it's not really traceable throught many topics, which item has which stage. And cyclic references are not helpfull at all. From point of https://github.com/mozilla/pdf.js/projects/1 it is clear for me, what pice of AcroForms is supported now (complettely) and what is on plan. Moreover, many topics address renering/viewing, but no words about fill/check/select/submitt etc. interactive feature. So, by example, "Text widgets" part above has nothing about "Text typing". Than, if "AcroForm Dictionary" is currently not parsed at all, how can it works really well? Maybe if would be helpfull for "users" to see a simply table where AcroForm featrures with their properties and a state of whole/particular/planned support listed. (why this showed bold=?!)

P.S. It is pain to me, I'm not JS/HTML5 expert, but done a lot of things on the other site (creating PDF with C#) and familiar wth other programming languages too. Is it worth to me to try to understand the current code in order to provide some more interactive support and help to develop this project? Or will be this take a huge amount of time just to understand the current architecture?

timvandermeij commented 6 years ago

I have removed the bold style for you. I would like to emphasize again that this is not the place for such a discussion; a channel like IRC would be more appropriate so we can give some background information. Filling in/submitting/printing forms is in fact in the checkbox list above, it just hasn't been implemented yet. The "text widgets" part is about rendering text widgets, which means the input fields you can type in. That's done; the part that remains is storing the entered values. Anyone is welcome to help out with implementing this.

kekkc commented 6 years ago

BTW: Chrome is also not able to save PDFs with forms, but there's a workaround. Forms are rendered by default and one is able to print them and one can even print them as PDF by default, including the form input.

Maybe this is applicable for pdf.js, too and we can just utilize the existing FF save as PDF ( https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/tabs/saveAsPDF )?

dhufnagel commented 6 years ago

I am playing around with pdf.js trying to print entered form text field values. I have a rudimentary working proof of concept where I can render entered values to the printing PDF. I now want to dicuss my approach and see if someone comes up with a better or simpler one.

In my approach I pass the entered values to the worker task by adding a map to the task. This map is currently filled on the 'beforeprint' event. In the 'getOperatorList' mehtod of the 'TextWidgetAnnotation' I read the object stream and replace the old text value of the 'Tj' operator with the new one. This works, but has a lot of problems coming along. The first one is, that it fails, if the stream has no 'Tj' operator because the field had no value. The second one is, that the placement for alignments other than 'left' will be wrong. So the next idea is to create a completely new stream calculating all values by myself. This will be a lot of work, so I wanted to discuss this approach first. I can already create a new stream and displaying the values, but again, there is the problem with the offset values of the 'Td' operation. I digged into to the code a bit and I think I need to calculate the offset X and Y position by taking into account the width and height of the String with the given Font. I found the FontDescriptor for one embedded font, but not for a system font. With the font descriptor I have the ascent and descent value of the font, with which I think I can calculate the y offset The x offset will be fixed for left-aligned texts, but needs to be calculated for centered, or right-aligned texts. I think I am able to do this with the widths array of the Font xRef, but again, there is no such for system fonts. So I think I would have to use a canvas and the measureText method.

So as you see there is a lot of 'thinking'. But before I try to implement and test my approach, I'd like to know what others are thinking of it.

timvandermeij commented 6 years ago

Some time ago we had a discussion about how we could approach this. Refer to https://mozilla.logbot.info/pdfjs/20161219. The idea is to have two different operator lists: one for the UI and one for printing. In the one for printing, we would replace operations based on the entered/selected value in the widget.

I think this is somewhat easier than what you're describing since we let the remaining logic do the heavy lifting for us; we just have to provide the correct operator list.

This is a problem that we have to solve in multiple small steps. The first step is to make the annotation code asynchronous, which is done by @dmitryskey in #9822. The next step would be to parse the AcroForm dictionary for e.g., fonts and to parse the default appearance entry in the annotation dictionary for all appearance information. For this we can probably use the evaluator to get the information as an operator list, which required the annotation code to be asynchronous. Then, we can create the printing operator lists for each annotation type.

dhufnagel commented 6 years ago

I also thought of creating the operation list by myself, but this would be more complicated for me than my approach. I just create the pdf object stream with 'BMC ... EMC' and pass the stream to the evaluator, which generates the operationlist. If I create the operation list array myself, I will have the same problems as with generating a new object stream. But imho it is more complicated to create the oplist than to create a string and convert it to a objectstream. This already works in my proof of concept.

kekkc commented 5 years ago

I though Opera/Chrome are using pdf.js as well, but Opera is able to print & use formular data. Maybe there's sth. we can reuse?

jwatt commented 5 years ago

They use PDFium, which is mainly C++ code.

bpetty-formfast commented 4 years ago

Hey all, the company I work for is starting to leverage PDFJS and I have been told I need to get "Storing entered values for when the page is destroyed when it is not visible" working. I am not sure if this thread is the right place to discuss it. @timvandermeij, it looks like you are a major driver of this project. Is there anyway we can get in contact with you or someone from the community that might be able to assist. I have a strategy for implementing this feature, but I want to make sure that what I do can also be mainlined back into this repo. We are also willing to sponsor or create some feature bounty as well, if that would help knock things off faster.

timvandermeij commented 4 years ago

If you have ideas on how this should be done, it's best to open a separate issue to discuss it. The main question is what to do with the entered data. Render it onto the canvas when printing? Provide an option to download the values in FDF format? Render a new PDF file with the filled values? Et cetera. It depends on what the user would expect and what other PDF readers do.

timvandermeij commented 4 years ago

Closing since AcroForm support is now done and enabled. The remaining issues are now filed in individual issues and collected with the 4-form-acroform tag; see https://github.com/mozilla/pdf.js/labels/4-form-acroform.