Open abirami005 opened 4 years ago
We cannot open source the code at the moment as it is related to our IP protection.
We cannot open source the code at the moment as it is related to our IP protection.
Then how about publishing the alignment data themselves in some form?
We cannot open source the code at the moment as it is related to our IP protection.
Then how about publishing the alignment data themselves in some form?
Em, I did not think of it before. Let me have a check along our legal approval chain.
I assume this means that providing only the code for extracting annotations from XML representation is also not possible at the moment?
@pollyMath Unfortunately that is what our IP lawyer told us.
We cannot open source the code at the moment as it is related to our IP protection.
Then how about publishing the alignment data themselves in some form?
Em, I did not think of it before. Let me have a check along our legal approval chain.
@zhxgj Did your lawyers reach a verdict regarding the publication of PDF/XML alignment data?
Note: This is relevant to a number of potential applications of this corpus, for which some choices made in the COCO format would be incompatible or suboptimal, e.g.
Unfortunately not yet. I understand the benefits, but we cannot release it yet. Thanks for your understanding.
On Tue, Jan 12, 2021 at 3:49 AM Robert Sachunsky notifications@github.com wrote:
We cannot open source the code at the moment as it is related to our IP protection.
Then how about publishing the alignment data themselves in some form?
Em, I did not think of it before. Let me have a check along our legal approval chain.
@zhxgj https://github.com/zhxgj Did your lawyers reach a verdict regarding the publication of PDF/XML alignment data?
Note: This is relevant to a number of potential applications of this corpus, for which some choices made in the COCO format would be incompatible or suboptimal, e.g.
- definition/granularity of region classes
- not annotating headers and footers
- not including reading order of regions
- not including text lines (contours / baselines)
- not including text content (plain) and text style (formatting)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ibm-aur-nlp/PubLayNet/issues/20#issuecomment-758080136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6BZDOMQJ545RQ35QSAHDLSZMTXZANCNFSM4K34F7UA .
Do you provide the scripts/code that you developed to match the PDFMiner outputs on the documents to the XML representation of the PDF page itself? Thanks