stephanrauh / ngx-extended-pdf-viewer

A full-blown PDF viewer for Angular 16, 17, and beyond
https://pdfviewer.net
Apache License 2.0
484 stars 184 forks source link

How to get the list of fields and their values inside a xfa form #1718

Closed SubratTest closed 1 year ago

SubratTest commented 1 year ago

Hi, I am using ngx-entended-pdf-viewer in my project, i want to know can i get all the fields from a xfa form and their values. I have the requirement to populate the fields and change their value programatically

korydondzila commented 1 year ago

@SubratTest depending on what you are wanting to do there are methods for extracting the form fields. In the project I've been working on we do this on the backend to get he field names/ids (mapping data to fields so we need a drop down in a different component). Note we use C# on the backend.

I do not know if this works for xfa, but I would assume it does or is very similar

    _pdfService.GetFields(pdfBlob!.OpenReadStream())
----------------
    public string[] GetFields(Stream pdfBlobStream)
    {
        var pdfDocument = PdfReader.Open(pdfBlobStream, PdfDocumentOpenMode.ReadOnly);
        return pdfDocument.AcroForm.Fields.Names;
    }

You could also use (annotationLayerRendered) (our front end is typescript-angular using 16.2.5, we use this for form field highlighting making data mapping easier)

public onAnnotationLayerRendered(event: AnnotationLayerRenderedEvent): void {
    const selectors = Array.from(
      (event.source as any).annotationLayer.div.children
    ).map((element: Element): HTMLElement => element as HTMLElement);

    selectors.forEach((selector): void => {
      const input = selector.firstChild as HTMLInputElement;
      this.renderer.setProperty(input, 'disabled', true);
      this.renderer.setStyle(input, 'pointer-events', 'none');

      if (this.inputSelection) {
        this.renderer.addClass(selector, 'selector');
        this.renderer.setStyle(
          selector,
          'cursor',
          'pointer',
          RendererStyleFlags2.Important
        );

        const tooltip = this.renderer.createElement('span');
        const text = this.renderer.createText(input.name);
        this.renderer.addClass(tooltip, 'tooltip');
        this.renderer.appendChild(tooltip, text);
        this.renderer.appendChild(selector, tooltip);

        this.renderer.listen(selector, 'click', (): void => {
          this.clipboard.copy(input.name);
          this.snackBarService.openSnackBarWithMessage(
            'Input ID copied to clipboard',
            'Close'
          );
        });
      }
    });
  }

Each input is a form field, you can access the names from there. input.name

SubratTest commented 1 year ago

Hi, Thanks for the information. Could you tell which service or library you used for backend. Our purpose it to get the fields from xfa and sets the value using code. On front end we use angular and backend c#

korydondzila commented 1 year ago

@SubratTest sorry about that, we used PdfSharpCore on our backend. Do not know if it works for xfa, but it might? <PackageReference Include="PdfSharpCore" Version="1.3.41"/>

stephanrauh commented 1 year ago

XFA is a recent addition to pdf.js, so I negected XFA support. Actually, I don't have an XFA document, so I can't run a test. Can you give me a PDF file using an XFA form? Preferably a file with a license allowing me to publish it? If you only have a file without such a liberal license, you can give it to me, too, but please don't forget tell me to consider it confidential.

SubratTest commented 1 year ago

Hi Stephen, I have attached the document. CAN-IMM1295E-1222_KBGFJG109670-9_1_1675893199355 (1) (2).pdf

SubratTest commented 1 year ago

Hi @stephanrauh , Hope you are doing good. did you get time to check on this.

korydondzila commented 1 year ago

@SubratTest Stephan has been busy preparing a talk https://github.com/stephanrauh/ngx-extended-pdf-viewer/issues/1714#issuecomment-1481778903

stephanrauh commented 1 year ago

... and the next talk - this time in English and in cyberspace, so if you've got an opportunity to join! :)

https://entwickler.de/graalvm-day/

stephanrauh commented 1 year ago

Getting a list of the fields and their values turns out to be easy. Just register (formDataChange):

<ngx-extended-pdf-viewer
      [src]="'/assets/pdfs/under-copyright/XFA-Canada-Immigration.pdf'"
      (formDataChange)="setFormData($event)">
public setFormData(data: { [fieldName: string]: string | string[] | number | boolean } | any) {
    console.log(data);
}

Every time the user fills a field, you get a json object containing the both the fields and their value. I only filled a few fields, but they look nicely, and the field identifier are even useful names (as opposed to standard Acroforms, where the field identifiers are usually cryptic gibberish):

grafik
stephanrauh commented 1 year ago

However, the other direction seems to be broken. It's meant to be [formData]:

    <ngx-extended-pdf-viewer
      [src]="'/assets/pdfs/under-copyright/XFA-Canada-Immigration.pdf'"
      (pageRendered)="delayedUpdateFormData()"
      [formData]="formData">

  public delayedUpdateFormData(): void {
    setTimeout(() => {
      this.initialized = true;
      this.updateFormData();
    });
  }

  public updateFormData(): void {
      this.formData['FamilyName']="Du Bois";
      this.formData['GivenName']="Michelle";
  }

The JSON object reported by two-way binding [(formData)] is updated as intended, but the new data doesn't show in the PDF file.

stephanrauh commented 1 year ago

I'm afraid it's not so easy to fix that. In the meantime, you can set the values from Angular using good old DOM manipulation:

document.querySelector("[xfaName='GivenName'] textarea").value='Michelle';
document.querySelector("[xfaName='FamilyName'] textarea").value='Du Bois';

That doesn't trigger a (formDataChange) event, but I suppose that event would be confusing, anyways.

Unfortunately, when the event fired, the values you've set from outside are ignored. Maybe it's a good idea to ignore [(formData)] altogether until I've found the bug (which may take a while) and use document.querySelector for both setting and reading the values of the input fields.

SubratTest commented 1 year ago

Hi Stephen,

Thanks a lot for this information.

We tried this approach and able to get and set values but one issue is happening when the field type is radio button or checkbox.

These fields are wrapped under multiple layer, for each field there will multiple XFA name.

All these yes, no fields you will find XFA names are yes and no those are wrapped under different XFA names which will not be there in field list which we are getting.

Could you check once and let us know is there a way to set these "yes",'no; fields. image

stephanrauh commented 1 year ago

I think you just need a slightly more advanced selector. The radio buttons belong to a group, and the individual radiobuttons have a value. However, the xfaon value seems to be what you're looking for:

<input class="xfaRadio" 
 type="radio" 
 checked="true" 
 xfaon="N" xfaoff="off" 
 value="N">

For example, to get the value of the radio button called SameAsMailingIndicator you can use this expression:

Array.from(document.querySelectorAll("[xfaName='SameAsMailingIndicator'] input")).find(radio => radio.checked)?.attributes['xfaon'].value

Setting a value requires multiple line (but face it, if you're into maintainability, you'll want to split my one-liner above into multiple lines, too):

setRadioButton(fieldname: string, value: string): void {
  const radios = Array.from(document.querySelectorAll(`[xfaName='${fieldname}'] input`));
  radios.forEach(radio => radio.checked = radio.attributes['xfaon'].value === value);
}
stephanrauh commented 1 year ago

Please note that my hint is based on a lot of guesswork - an educated guess because I can also see the sourcecode of pdf.js, but even so, my research is based on a very small number of PDF files (exactly one). I'm positive I've guessed right, but there's nothing in the way of guarantees.

SubratTest commented 1 year ago

Hi Stephen,

Array.from(document.querySelectorAll("[xfaName='SameAsMailingIndicator'] input")).find(radio => radio.checked)?.attributes['xfaon'].value

This will work fine but the issue is you will not get this XFA name through code i.e SameAsMailingIndicator.

formData variable will capture 96 xfa fields, but you will not find this particular field"SameAsMailingIndicator" , even for other yes , no fields you will not find the xfa field name in form data variable.

 async pageRendered(event:any)
  {
    console.log("testing end");
    console.log(this.formData);

    Object.keys(this.formData).length;
    for (const [key, value] of Object.entries(this.fieldDetails)) {
      console.log(`${key}: ${value}`);
      console.log("page loaded end");
    //  console.log(this.formData);
      if(key=='FamilyName')
      {
        var data=document.querySelector("[xfaname="+key+"]") as HTMLElement;
       var datanew:any= document.querySelector("[fieldid="+data.dataset.elementId+"]");
       // console.log(datanew);
       datanew.value="subratnew";

       datanew.dispatchEvent(new Event('input'));
       datanew.dispatchEvent(new Event('change'));
      //  this.formData[key]="subrat";

      }

      if(key=='Yes')
      {
       var data=document.querySelector("[xfaname="+key+"]") as HTMLElement;
       var datanew:any= document.querySelector("[fieldid="+data.dataset.elementId+"]");

        datanew.setAttribute("checked",true);
        datanew.dispatchEvent(new Event('input'));
        datanew.dispatchEvent(new Event('change'));
      }
      if(key=='Citizenship')
      {
       var alldata:any=document.querySelectorAll("[xfaname="+key+"]");
       if(alldata.length>0)
       {

       var datanew:any= document.querySelector("[fieldid="+alldata[1].dataset.elementId+"]");
       for (const option of datanew.children) {
         if (option.innerText == "India") {
           option.selected = true;
         }
       }
     }
      // datanew.value="India";
       datanew.dispatchEvent(new Event('input'));
       datanew.dispatchEvent(new Event('change'));
      }
      if(key=='DOBYear')
      {
        var data=document.querySelector("[xfaname="+key+"]") as HTMLElement;
        var datanew:any= document.querySelector("[fieldid="+data.dataset.elementId+"]");
        datanew.value="2023";

      }
      if(key=='PrevMarriedIndicator')
      {
        var data=document.querySelector("[xfaname="+key+"]") as HTMLElement;
       var datanew:any= document.querySelector("[fieldid="+data.dataset.elementId+"]");
      }
      if(key=='usCardIndicator')
     {
       var alldata:any=document.querySelectorAll("[xfaname="+key+"]");
       console.log(alldata);
     }

      if(key=='PlaceBirthCountry')
      {
       var alldata:any=document.querySelectorAll("[xfaname="+key+"]");
       console.log(alldata);
      }

     }
    // this.formData=this.fieldDetails;
    const PDFViewerApplication: IPDFViewerApplication = (window as any).PDFViewerApplication;
   await PDFViewerApplication.pdfDocument.saveDocument();
   console.log(this.fieldDetails);

   // const pdf = await (pdfjsLib:any).getDocument()
   // console.log(this.fieldDetails);

  }
  saveData(event:any)
  {
   // console.log("test");
  }
  XfaData(event:any)
  {
    console.log("xfa layered rendered");
    const selectors = Array.from(
      (event.source as any).xfaLayer.div.children
    ).map((element: any): HTMLElement => element as HTMLElement);
    selectors.forEach((selector): void => {
      const input = selector.firstChild as HTMLInputElement;
    });
  }
 async onPagesLoaded(event:any)
  {
    for (const [key, value] of Object.entries(this.fieldDetails)) {
       console.log(`${key}: ${value}`);
       console.log("page loaded end");
     //  console.log(this.formData);
       if(key=='FamilyName')
       {
         var data=document.querySelector("[xfaname="+key+"]") as HTMLElement;
        var datanew:any= document.querySelector("[fieldid="+data.dataset.elementId+"]");
        // console.log(datanew);
        datanew.value="subratnew";

        datanew.dispatchEvent(new Event('input'));
        datanew.dispatchEvent(new Event('change'));
       //  this.formData[key]="subrat";

       }

       if(key=='Yes')
       {
        var data=document.querySelector("[xfaname="+key+"]") as HTMLElement;
        var datanew:any= document.querySelector("[fieldid="+data.dataset.elementId+"]");

         datanew.setAttribute("checked",true);
         datanew.dispatchEvent(new Event('input'));
         datanew.dispatchEvent(new Event('change'));
       }
       if(key=='Citizenship')
       {
        var alldata:any=document.querySelectorAll("[xfaname="+key+"]");
        if(alldata.length>0)
        {

        var datanew:any= document.querySelector("[fieldid="+alldata[1].dataset.elementId+"]");
        for (const option of datanew.children) {
          if (option.innerText == "India") {
            option.selected = true;
          }
        }
      }
       // datanew.value="India";
        datanew.dispatchEvent(new Event('input'));
        datanew.dispatchEvent(new Event('change'));
       }
       if(key=='usCardIndicator')
      {
        var alldata:any=document.querySelectorAll("[xfaname="+key+"]");
        console.log(alldata);
      }

       if(key=='PlaceBirthCountry')
       {
        var alldata:any=document.querySelectorAll("[xfaname="+key+"]");
        console.log(alldata);
       }

      }
stephanrauh commented 1 year ago

Yes, I've observed this, too. I just hoped that's not an issue to you.

At the moment, I'm stuck because the bleeding-edge branch of my fork of pdf.js is broken. It doesn't cope with your XFA form. I'm afraid I have to solve this first before being able to address you issue. I've learned the hard way it's a bad idea to postpone badly solved merge conflicts with pdf.js.

stephanrauh commented 1 year ago

BTW, may I give you a few suggestion how to improve your code?

stephanrauh commented 1 year ago

@SubratTest I've managed to fix radio buttons. Version 17.0.0 is going to send the radio buttons correctly.

SubratTest commented 1 year ago

Hi Stephen, Thanks a lot for your help. Could you tell me by which event i can get all the fields from all the pages once the pdf get loaded in the web viewer.

stephanrauh commented 1 year ago

I haven't uploaded version 17.0.0-alpha.0 yet, so you don't see a difference yet. But when I'll have done that, you'll get the fields from the (formDataChange) event. But I can't promise you'll see every field of every page immediately. pdf.js uses lazy loading to save memory and to deliver good performance. If that's an issue to you, you need to tweak the pre-rendering algorithm as documented here: https://pdfviewer.net/extended-pdf-viewer/prerendering

stephanrauh commented 1 year ago

A short update: currently, I'm implementing two-way binding. It's starting to work, but it's slow going. XFA forms are rendered remarkably different from AcroForm documents. Plus, my available time is still limited. Nonetheless, there's steady progress.

SubratTest commented 1 year ago

Hi Stephen, That's great. Thanks a lot for your help in this. @stephanrauh in current scenario if want to fetch field name i.e "AliasFamilyName" we are directly getting from formchange event, but if a page contains two fields with the same name, how can we fetch the correct one.

Can we get this kind of structure for a particular field

form1[0].PersonalDetails[0].AliasName[0].AliasFamilyName[0]

stephanrauh commented 1 year ago

I ran into the same problem this evening. There seem to be two fields called "FamilyName". I suspected the error to be on my side - but maybe it's really a problem with the document. I'm pretty sure field names are expected to be unique, with the exception of radio buttons.

If the document really has non-unique field names, two-way binding won't work. I hope I've missed a clue somewhere. Stay tuned!

stephanrauh commented 1 year ago

I've decided to re-write the entire form support from scratch. As far as I can see, it works now, but it'll take a couple of days to land. I'm busy next week.

As for the non-unique field names: I'm following your idea, just without the array indexes. The fully qualified name is even a bit longer because it adds the name of the page, which is also an XFA attribute of your example form.

SubratTest commented 1 year ago

Hi Stephen, Thanks a lot for this. We are really looking forward to this solution as we have an urgent requirement upon this. Could you tell me when you will be pushing the changes.

stephanrauh commented 1 year ago

@SubratTest I feel you. I don't want to let you down. Nonetheless, this is a leisure-time project, so most of the time I can only dedicate one or two hours after work to it. I'd like to do more. If you're willing to spend some money, you can always talk to my boss. My hourly rate isn't exactly cheap, but until now, I can proudly say every customer considers the money well spent.

By the way, I'm slowed down by not having a XFA file I'm allowed to publish on the showcase. One day or another, I'll write a program generating such a file, but as you can imagine, that's takes some time.

stephanrauh commented 1 year ago

[(formData)] now works properly with XFAs and AcroForms. Your bugfix has landed with version 17.0.0-alpha.2.