Open rageshS opened 4 years ago
Any update on this ?
Hi, rageshS, Have u solved this problem? I also need to extract the address in multi-lines but it seems it can only use custom items which use fixed regex location to extract the address.
Hi, rageshS, Have u solved this problem? I also need to extract the address in multi-lines but it seems it can only use custom items which use fixed regex location to extract the address.
@Jane-Ding I didn't get any helpful answer, and still facing the issue. I think this is not activly maintained git repo.
Hi, @rageshS @Jane-Ding , I have discovered one way to do it by capturing the text using the area as pdf2text supports x,y coordinates. Yaml Template for area plugin : area:
you can have a look at this guide might help you out: https://www.youtube.com/watch?v=JOdLRe4MTmo&list=PLhPDb5zFmGR1CCSX_oxLGyPuHPbWEToSf
Hi All,
I am a newbie in using invoice2data python library, and really satisfied with this. it was very useful for me. But I find some limitations in multiline data in the invoice pdf such as the address field. Here I explain one of the problem, that I faced while using invoice2data to extract address data, that is if our desired data field is a multiline data and one more data fields lie on the horizontal position ( I mean left side or right side, for example, 'Invoice Address' and 'Trading Address' ), there is a chance to concatenate this data together while the extraction time.
I think if write the regular expression for capture 'invoice address' field, it will capture the 'Trading address' text too. I already checked the templates provided in this git repository. But I can't find any example for the capture address field from the invoice pdf.