ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Apache License 2.0
108 stars 14 forks source link

Tables cells colors #447

Open Scoutink opened 1 month ago

Scoutink commented 1 month ago

Hi again

Is there a way to get cells colors code/name? Sometimes it represent an information (just like in the last column of the attached table*).

NastyBoget commented 1 month ago

Hello! What it the type of the file (DOCX, PDF, image, etc.)? At the current moment we don't have this functionality, but it can be added for some formats

Scoutink commented 1 month ago

Mainly pdf. I will send you an email sample that explains the context.

NastyBoget commented 1 month ago

I consulted with my colleagues and we decided to try to implement it. We need some research, so it make take a while to solve the task

Scoutink commented 1 month ago

You all are the best. Good luck. I'll keep following the updates.