openai / supervised-reptile

Code for the paper "On First-Order Meta-Learning Algorithms"
https://arxiv.org/abs/1803.02999
MIT License
991 stars 211 forks source link

fetch_data might not be working correctly for mini-Imagenet? #1

Open GhassenJ opened 6 years ago

GhassenJ commented 6 years ago

Hi, I followed the instructions but the code for mini-imagenet keeps exiting because: OSError: cannot identify image file <_io.BufferedReader name='data/miniimagenet/train/n04515003/n04515003_15351.JPEG'> Most of this folder (n04515003) is empty images, however fetch_data seemed to exit normally without any error messages?
I checked how many empty files there were under mini-imagenet's subfolder and it seems to be 53186/59981 files? Omniglot seems fine, however? Any idea what could be wrong with the script or what I could be doing wrong?

unixpickle commented 6 years ago

Thanks for reporting this. Are some of the ImageNet images valid? If so, is there a general pattern as to which ones are empty?

I'm not sure what could cause this. Perhaps the ImageNet server is doing some kind of rate-limiting. If so, it may be possible to modify the script to detect this and print an error.

I'd expect omniglot to be fine, since the omniglot download process is much simpler than that for Mini-ImageNet

nattari commented 6 years ago

While trying to download the image from the list of imagenet url, some of the image-ids do not exist. Any particular reason for that? For eg. in test data, n01930112_10035 is not there in the list. I have used "List of all image URLs of Fall 2011 Release".

TIA

unixpickle commented 6 years ago

@nattari some of the images are not in the 2011 release, since the dataset is from the 2012 release. That's why the download script extracts files from the 2012 tar file. If there is a better API for getting 2012 images, let me know.

nattari commented 6 years ago

Hmm, I am using the images from 2011 release at the moment. Since, I am more interested in understanding the algorithm so I guess that would work too. In case I find out, I would definitely share.

Could you please tell me, what GPU configuration do you use for training mini-imagenet and how long does it take you to train?

unixpickle commented 6 years ago

I used a single 1080 Ti for most of the experiments. For all the benchmarks, training takes less than a day. The exact time depends on the hyper-parameters and dataset you use.

nattari commented 6 years ago

I started training on some other data of ImageNet. Everything works fine but I get this warning : "Possibly corrupt exif file". Training gets stuck after some iterations. Do you have any clue what could be the problem here? The only thing I change is data. Is it due to the warning ?

unixpickle commented 6 years ago

Is it possible that some class directories are empty or don't contain enough samples? I think it's possible to hang the training loop if there aren't enough samples to create a mini-batch, since it keeps looping over the data forever hoping to create a whole mini-batch.

nattari commented 6 years ago

I thought about it and made sure that all the class directories contain enough sample. So that doesn't seem to be the problem. What I fear atm is the warning! But not sure.

unixpickle commented 6 years ago

Huh, interesting. It would be nice to know where the program is stuck. When you kill the process, does Python print out a stack trace? If not (e.g. if the hang is inside the TF graph), maybe it will be helpful to attach a debugger to the process and look at a backtrace that way.

nattari commented 6 years ago

Believe it is stuck in TF graph, yes I am trying to debug now. But here is the screenshot in case you could find something fishy here. screenshot from 2018-06-12 15-25-01

nattari commented 6 years ago

I am observing very less GPU utilization for both Omniglot and MiniImageNet (~ 2-3% or even less). This shouldn't be the case, I believe? Also, I am using 1080i and for MiniImageNet it is only utilzing ~500MB memory and doesn't change even if I change the batch size. Can you provide insights on this behaviour? (I am using Python 3.6, Tensorflow 1.8 and Cuda 9.0)

TIA.

unixpickle commented 6 years ago

@nattari at first, things will be slow because the training pipeline is still loading the images into memory and resizing them on the fly. After training has run for a little while, the images will all be cached in memory, and you should start to see higher GPU utilization.

unixpickle commented 6 years ago

As for memory, I'm not entirely sure. If you're referring to GPU memory, I think TensorFlow allocates blocks of memory at once, so you might not see subtle changes. If Python memory, then this is expected, since Python's memory usage will be dominated by loading and caching images.

lampardwk commented 5 years ago

@unixpickle I had the same problem with the incomplete miniimagenet data downloaded from fetch_data.sh,most of folder is empty images. Could you send me a complete data set?My email address is lampardwk518@163.com,thanks.

Liuyubao commented 5 years ago

@unixpickle So sorry to bother that had the same problem with the incomplete mini-imagenet data downloaded from fetch_data.sh, most of folder is empty images. Could you also send me a complete data set?My email address is liuyubao96@gmail.com, thanks a lot for your time and patience.

eghouti commented 4 years ago

Hello @unixpickle,

First I would like to thank you for your excellent work that helps me a lot in my research. I would like to ask you if I can have the mini-imagent dataset you used to run these experiments. My email address is ghouthi.bouklihacene@imt-atlantique.fr

Best regards,

Ghouthi

ligeng0197 commented 3 years ago

Hi @unixpickle .

I find that MiniImageNet source url in fetch_script is already invalid and I tried to find another source on ImageNet website, and I do find one. (http://www.image-net.org/challenges/LSVRC/2012/dd31405981ef5f776aa17412e1f0c112/ILSVRC2012_img_train.tar). However, after replacing the url in fetch_script and downloading images, I met empty images problem mentioned by others. I got 13 empty images in train and val datasets, and I decided to relpace them manually with same object images. Unfortunately, after replacing I still get stuck by (OSError: image file is truncated (26 bytes not processed)) when training. I believe its caused by some incomplete images in train dataset, but I am kind of tired to fix it by hand . So would you mind sharing the miniimagenet dataset to google drive or some other places we can download directly? Thx ahead.

P.S. when replacing the empty images, i find MiniImagenet taken here is a little different from what is taken in pytorch-MAML(https://github.com/dragen1860/MAML-Pytorch).

asd81310 commented 3 years ago

Did someone get the correct mini-imagenet in this experiments? If you did, can send share the dataset to me? Thanks very much for your help. My email address is My email address is asd81310@gapp.nthu.edu.tw.

XA23i commented 2 years ago

you can follow the instructions at this link https://github.com/dragen1860/MAML-Pytorch. Then modify miniimagenet.py line 53: names = [f for f in os.listdir(self.dir_path) if f.endswith('.jpg')]