Open jazzido opened 9 years ago
This is awesome and I'm excited about getting it built. I think we were talking with @floodfish during one of our meetings a while ago thinking that we're at a stage that this much-requested feature is doable. We could even make it possible to "save" a "template"...
That said, I want to get this feature/newUI version shipped ASAP. There are, I think, a lot of improvements that I have personally been using for MONTHS that our users deserve to have.
Absolutely, no need to wait for this to release newUI
. If anything, this can progress in parallel and have it as a command line thing for those who want to use tabula-java
.
👍👍
On Mon, Mar 9, 2015 at 10:50 PM, Manuel Aristarán notifications@github.com wrote:
Absolutely, no need to wait for this to release newUI. If anything, this can progress in parallel and have it as a command line thing for those who want to use tabula-java.
— Reply to this email directly or view it on GitHub https://github.com/tabulapdf/tabula-java/issues/10#issuecomment-77986898 .
Yeah, I think we even discussed saving that feature for later. Definitely shouldn't hold up launch of the good stuff we have
Hello , does anyone know tell me if had any advance in this tabaular structure? http://dump.jazzido.com/tabula-table-editor/ How can I contribute to advance in this structure? Thanks.
Hi @paulohpcardoso,
It would be absolutely fantastic if you could help us finish and integrate the table editor. The link that you mentioned contains the code from the table_editor
branch of the tabula_table_editor
repo. I spent quite a bit of time working on that more than a year ago, but we never had the time to fully integrate it with Tabula.
The purpose of that tool is to generate a set of ruling lines that would be passed as an argument to public List<? extends Table> extract(Page page, List<Ruling> rulings)
in the SpreadsheetExtractionAlgorithm
class.
If you want to attempt that, I would be more than happy to help you navigate both the code of the table_editor
tool, tabula-java
and then we could work on integrating with the Tabula tool.
Thanks for your interest on this!
Extracting tables with a predefined template or stencil is an frequently requested feature for Tabula. Some use cases:
I've implemented (4439b57e8e19b92792b577747ad1551144ad8ec7) a new method in
SpreadsheetExtractor
that, instead of building sets of cells ("spreadsheets") from the ruling lines contained in aPage
, takes aList<Ruling>
as a parameter. That would allow us to expose a feature in the command line tool and on an HTTP API that takes a structure such as:In the context of Tabula, we would be adding a
rulings
key to the extraction parameters that it sends to the server to include data about the separators. In the context of the command line application, it could accept a JSON file with the specification.Having a GUI for this feature would be awesome. There's a
tabula-table-editor
that I started last year, that could be integrated into Tabula for adjusting the detected tabular structures: http://dump.jazzido.com/tabula-table-editor/Additionally, the
BasicExtractionAlgorithm
can use this new method as its extraction backend, by building aList<Ruling>
from the detected lines and columns.@jeremybmerrill, @mtigas: Would love to hear your thoughts about this.
Cheers!