ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Apache License 2.0
194 stars 21 forks source link

Tables cells colors #447

Open Scoutink opened 6 months ago

Scoutink commented 6 months ago

Hi again

Is there a way to get cells colors code/name? Sometimes it represent an information (just like in the last column of the attached table*).

NastyBoget commented 6 months ago

Hello! What it the type of the file (DOCX, PDF, image, etc.)? At the current moment we don't have this functionality, but it can be added for some formats

Scoutink commented 6 months ago

Mainly pdf. I will send you an email sample that explains the context.

NastyBoget commented 6 months ago

I consulted with my colleagues and we decided to try to implement it. We need some research, so it make take a while to solve the task

Scoutink commented 6 months ago

You all are the best. Good luck. I'll keep following the updates.