microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.41k stars 2.46k forks source link

IIT-CDIP Test Collection is unavailable? #250

Open marton-avrios opened 3 years ago

marton-avrios commented 3 years ago

I want to reproduce pre-training results but the pre-training dataset site seems to be unavailable for some time (https://ir.nist.gov/cdip/). Does anyone knows anything? Or where else can I find it?

paulpaul91 commented 3 years ago

I want to reproduce pre-training results but the pre-training dataset site seems to be unavailable for some time (https://ir.nist.gov/cdip/). Does anyone knows anything? Or where else can I find it?

hi, friends, can you download https://ir.nist.gov/cdip/ now?

marton-avrios commented 3 years ago

Nope, still "Forbidden"

pushpendradahiya commented 3 years ago

Still forbidden for me as well. Is there a Geographical location-based issue? or is there any mirror for this dataset? Any help would be really appreciated.

hahaplus commented 3 years ago

Still forbidden for me as well. Is there a Geographical location-based issue? or is there any mirror for this dataset? Any help would be really appreciated.

Hi, can you download https://ir.nist.gov/cdip/ now?

pushpendradahiya commented 3 years ago

Still forbidden for me as well. Is there a Geographical location-based issue? or is there any mirror for this dataset? Any help would be really appreciated.

Hi, can you download https://ir.nist.gov/cdip/ now?

NO... If you find it, please post it here as well.

RamanHacks commented 3 years ago

@hahaplus can you please upload a copy of this dataset somewhere?

linan142857 commented 3 years ago

Does anyone have a copy of 'iit-cdip test collection 1.0' dataset? Please make a mirror for academic research.

Cogdof commented 2 years ago

You may find it helpful to refer to this git issue. https://github.com/cneud/ocr-gt/issues/12

they still working on to host that dataset with AWS.