mawanda-jun / TableTrainNet

Table recognition inside douments using neural networks
MIT License
94 stars 42 forks source link

ICDAR 2017 POD Competition link broken #3

Closed harirajeev closed 4 years ago

harirajeev commented 5 years ago

ICDAR 2017 POD Competition link seems to be broken. Do you know where do get a copy of this dataset ?.

mawanda-jun commented 5 years ago

Unfortunately not. I think they have deleted it... If you want I can send it to you (I have an offline copy).

flauted commented 4 years ago

@mawanda-jun Could you send me the data?

mawanda-jun commented 4 years ago

Hi, Of course! Let me upload it to mega, then I'll share the link with you.

Il 30 Ago 2019 23:58, flauted notifications@github.com ha scritto:

@mawanda-junhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun&data=02%7C01%7C%7Cdcbfa77787e64ffed0b808d72d951fd8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637027990808074445&sdata=SvHurg55tNBysWSaZHsC7KNZUAHz7FZFmQAyPdJQ9ME%3D&reserved=0 Could you send me the data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun%2FTableTrainNet%2Fissues%2F3%3Femail_source%3Dnotifications%26email_token%3DAI3WBIZ6CMKVAERKAMRVPBLQHGJWPA5CNFSM4HLQ2IDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5S4C2A%23issuecomment-526762344&data=02%7C01%7C%7Cdcbfa77787e64ffed0b808d72d951fd8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637027990808084456&sdata=vAxbLVD0%2FnAHhLGplwIBfa8BYFHq%2Fdn0%2BezPoZeBrvc%3D&reserved=0, or mute the threadhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI3WBI4CAPPBV2TKBC4BGELQHGJWPANCNFSM4HLQ2IDA&data=02%7C01%7C%7Cdcbfa77787e64ffed0b808d72d951fd8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637027990808094461&sdata=32kyG6%2FUudibTJrv85nNffBgXhGtYY%2BTrW6t%2FecAsPY%3D&reserved=0.

mawanda-jun commented 4 years ago

Hi, thishttps://mega.nz/#!6QlwGaAb!BKf962iBlfeL7oEqaVnDC4K3F47zrqtaU12OCJlcbTw is the link. I hope you will code well. 😊 Giovanni

Inviato da Postahttps://go.microsoft.com/fwlink/?LinkId=550986 per Windows 10


Da: flauted notifications@github.com Inviato: Friday, August 30, 2019 11:57:59 PM A: mawanda-jun/TableTrainNet TableTrainNet@noreply.github.com Cc: Giovanni Cavallin giovanni.cavallin@outlook.com; Mention mention@noreply.github.com Oggetto: Re: [mawanda-jun/TableTrainNet] ICDAR 2017 POD Competition link broken (#3)

@mawanda-junhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun&data=02%7C01%7C%7Cdcbfa77787e64ffed0b808d72d951fd8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637027990808074445&sdata=SvHurg55tNBysWSaZHsC7KNZUAHz7FZFmQAyPdJQ9ME%3D&reserved=0 Could you send me the data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun%2FTableTrainNet%2Fissues%2F3%3Femail_source%3Dnotifications%26email_token%3DAI3WBIZ6CMKVAERKAMRVPBLQHGJWPA5CNFSM4HLQ2IDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5S4C2A%23issuecomment-526762344&data=02%7C01%7C%7Cdcbfa77787e64ffed0b808d72d951fd8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637027990808084456&sdata=vAxbLVD0%2FnAHhLGplwIBfa8BYFHq%2Fdn0%2BezPoZeBrvc%3D&reserved=0, or mute the threadhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI3WBI4CAPPBV2TKBC4BGELQHGJWPANCNFSM4HLQ2IDA&data=02%7C01%7C%7Cdcbfa77787e64ffed0b808d72d951fd8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637027990808094461&sdata=32kyG6%2FUudibTJrv85nNffBgXhGtYY%2BTrW6t%2FecAsPY%3D&reserved=0.

flauted commented 4 years ago

@mawanda-jun thanks again for the data. I noticed the Images and Annotations folders contain 1600 images each, but the paper says there are 2000 images. Do you have any insights?

mawanda-jun commented 4 years ago

I think the other 400 images are the test ones (so no labels) but I’m not sure of it. Since I’ve done this work some time ago, I really don’t remember what I’ve embedded into the .zip. However, I’m pretty sure that I’ve packed all that was reachable and meaningful for the training/validation part – in the sense that also at that time it was difficult to take those files.

If you are trying to do some document parts recognition, I think you will find this dataset useful: https://github.com/doc-analysis/TableBank

Have a nice day, Giovanni

Da: flautedmailto:notifications@github.com Inviato: martedì 15 ottobre 2019 15:56 A: mawanda-jun/TableTrainNetmailto:TableTrainNet@noreply.github.com Cc: Giovanni Cavallinmailto:giovanni.cavallin@outlook.com; Mentionmailto:mention@noreply.github.com Oggetto: Re: [mawanda-jun/TableTrainNet] ICDAR 2017 POD Competition link broken (#3)

@mawanda-junhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun&data=02%7C01%7C%7C71e3aac550544e82cc5b08d751776982%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067445615428449&sdata=uLFIRu3wXqxCQ4LnCNOdMXt9nHz97TXNnk7JomMRg2g%3D&reserved=0 thanks again for the data. I noticed the Images and Annotations folders contain 1600 images each, but the paperhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fieeexplore.ieee.org%2Fdocument%2F8270162&data=02%7C01%7C%7C71e3aac550544e82cc5b08d751776982%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067445615428449&sdata=6ukeS5LgNlbb7JCMWiRSvGGmuaK5tDrxDOQvnr%2BstFM%3D&reserved=0 says there are 2000 images. Do you have any insights?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun%2FTableTrainNet%2Fissues%2F3%3Femail_source%3Dnotifications%26email_token%3DAI3WBIZ2NCCTSR7RCUOQQ23QOXDXBA5CNFSM4HLQ2IDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBI3CZY%23issuecomment-542224743&data=02%7C01%7C%7C71e3aac550544e82cc5b08d751776982%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067445615438458&sdata=N101dN%2Fao9ACveXQ0U5SzPtFpK4oXkmf3zOknRrqI3c%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI3WBIZSRSQSBTOZY5AY6QDQOXDXBANCNFSM4HLQ2IDA&data=02%7C01%7C%7C71e3aac550544e82cc5b08d751776982%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067445615438458&sdata=D1z09bYaTEjBEtFiKHXYqtRLe594pM1rnDH0p68hxgk%3D&reserved=0.

flauted commented 4 years ago

Oh thank you. You're right. There's an "other" folder with 817 images. That must be the test set.

This paper says there's 1600 images in the train set and 817 in the test set. So that agrees with you.

So I assume you do not have the labels for the test set. Is that correct?

Also: Thanks for the note about TableBank. That is indeed useful.

mawanda-jun commented 4 years ago

You are welcome 😊 And yes, no labels for test set unfortunately. This dataset is not so big. :/


Da: flauted notifications@github.com Inviato: Tuesday, October 15, 2019 4:38:29 PM A: mawanda-jun/TableTrainNet TableTrainNet@noreply.github.com Cc: Giovanni Cavallin giovanni.cavallin@outlook.com; Mention mention@noreply.github.com Oggetto: Re: [mawanda-jun/TableTrainNet] ICDAR 2017 POD Competition link broken (#3)

Oh thank you. You're right. There's an "other" folder with 817 images. That must be the test set.

This paperhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.semanticscholar.org%2Fpaper%2FPage-Object-Detection-from-PDF-Document-Images-by-Li-Yin%2F15e415ff92aaf2e550a7443f158843bbad780c2c&data=02%7C01%7C%7Cb8b9e58c27dc422d559708d7517d58b1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067471102749597&sdata=MSe6eB%2FaqFy4VOPHiV%2B7kP9QvsD3UL0HcM1xv1hmk%2BU%3D&reserved=0 says there's 1600 images in the train set and 817 in the test set. So that agrees with you.

So I assume you do not have the labels for the test set. Is that correct?

Also: Thanks for the note about TableBank. That is indeed useful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun%2FTableTrainNet%2Fissues%2F3%3Femail_source%3Dnotifications%26email_token%3DAI3WBI3OUEZHJFZOU3UAEEDQOXIWJA5CNFSM4HLQ2IDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJAHXA%23issuecomment-542245852&data=02%7C01%7C%7Cb8b9e58c27dc422d559708d7517d58b1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067471102759602&sdata=zMd9vaBa%2FyBlCT4An7y4vCQG3sZWtCMLJg8WmdNHEh0%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI3WBI6PYRFVK3CGU2DJACTQOXIWJANCNFSM4HLQ2IDA&data=02%7C01%7C%7Cb8b9e58c27dc422d559708d7517d58b1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637067471102769619&sdata=uEdXZxIRakZ2Lkcxjj1PsnzIpB4%2B6ybSQocpWKE4hs8%3D&reserved=0.

hjkim811 commented 2 years ago

Hi, 3 years have passed and I'm looking for the same ICDAR-POD 2017 dataset :( May I have to chance to get the same data @flauted got? I was really looking for it

mawanda-jun commented 2 years ago

Hi, Thank you for the appreciation. Unfortunately, AFAIK no official links have been left on the Internet. I managed to find the original ICDAR POD 2017 competition dataset in my own data, which I'm happy to share (https://1drv.ms/u/s!AksWleeYa3Qjhu9sG1AclNPyMasz-A). No copyright infringement intended, if there is any complain please write me asap.

-------- Messaggio originale -------- Da: hjkim811 @.> Data: 27/03/22 16:14 (GMT+01:00) A: mawanda-jun/TableTrainNet @.> Cc: Giovanni Cavallin @.>, Mention @.> Oggetto: Re: [mawanda-jun/TableTrainNet] ICDAR 2017 POD Competition link broken (#3)

Hi, 3 years have passed and I'm looking for the same ICDAR-POD 2017 dataset :( May I have to chance to get the same data @flautedhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflauted&data=04%7C01%7C%7C0ae3f5703f1245e9045b08da0ffc0737%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637839872390858363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MtzyDWX7q42Vs%2FQIxYDtJfmTIuOOTwYlmTVq8E0FBMU%3D&reserved=0 got? I was really looking for it

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmawanda-jun%2FTableTrainNet%2Fissues%2F3%23issuecomment-1079940367&data=04%7C01%7C%7C0ae3f5703f1245e9045b08da0ffc0737%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637839872390858363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qUYzrjkAjeP1sgDeN2mV8EfeMEi%2F6mmaLNe%2BPWjJB34%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI3WBI4QQOOYX6VYWB3CRGDVCBUKFANCNFSM4HLQ2IDA&data=04%7C01%7C%7C0ae3f5703f1245e9045b08da0ffc0737%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637839872390858363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Zmd2E%2Br9Q4xsK3NhLPQFiNCmnT7uc6hyr%2Bt03WvNu6M%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

hjkim811 commented 2 years ago

Thank you very much! I really appreciate it :)

igodogi commented 1 year ago

Could you reopen the data for download? Thanks very very much. (https://1drv.ms/u/s!AksWleeYa3Qjhu9sG1AclNPyMasz-A)