Closed Mandeep258 closed 3 years ago
Hi there,
Please attach the pdf you're using. Then I can debug the issue.
Kind regards, Joris Schellekens
I cannot upload the documents as they are classified so I tried to find something which might give similar issue.
Also is there a way to identify number of pages in the document which can help iterate in the get_text_from_page method.
I'll have a look at the document.
Are you using my library in a commercial setting?
You can easily get the number of pages from the DocumentInfo object.
Kind regards, Joris Schellekens
I came across the article https://stackabuse.com/automating-processing-pdf-invoices-in-python-with-borb/ and just used the part which says extract all text, so thought of trying it. I haven't explored any commercial settings, not aware of them yet. We actually are using elasticsearch which helps us with key-word search so extracting data from documents and indexing them is the task. Initially we went with Apache Tika but I'm also trying to explore other libraries which might help. Also one more request, is it possible to speedup the process of loading a document as it takes lot of time and since it is a cpu-bound then using multi-threaded becomes difficult.
I was just wondering why you keep using "we" (plural). It sounds as if you're talking about a group of people (a development team, or company) rather than just yourself.
hehe.... A development team, we are working on building pipeline which would go thorough crawl-convert-index. So we. :D
Then you are using borb in a commercial setting. Please make sure you comply to the AGPL3
sure, If its not compatible to use then we will not.
I find it a bit worrying that you were perfectly happy to use my library without checking the license, in a commercial setting.
I didn't realize that. I apologize as most of python libraries are open-source so I didn't check it and I have uninstalled it, will not be using it anymore.
There is a difference between being open-source and being "free of charge".
This is also clearly mentioned in the README.
You should think of free (in the context of open source at least) as "free speech" rather than "free beer".
The AGPL3 license allows you to use my product only if you yourself are open source to all of your users.
If you prefer not to be open source, or you can't (due to some NDA or confidentiality agreement), you can purchase a commercial license.
But please, do not confuse "open source" with "I am not supposed to support the developer (s) of this product".
Kind regards, Joris Schellekens
Hi team,
We are trying to use the library to extract data from the pdf files but there are no spaces in between words and we cannot use that. Is there a way to fix this?
Regards Mandeep