thunder-project / thunder

scalable analysis of images and time series
http://thunder-project.org
Apache License 2.0
814 stars 184 forks source link

OME-TIFF not loading #318

Open bainzo opened 8 years ago

bainzo commented 8 years ago

I'm getting an issue loading ome-tifs into thunder:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-9-eed51476f58a> in <module>()
----> 1 data = td.images.fromtif('./test-tif/')

/home/ec2-user/anaconda2/lib/python2.7/site-packages/thunder/images/readers.pyc in fromtif(path, ext, start, stop, recursive, nplanes, npartitions, labels, engine, credentials)
    371     return frompath(path, accessor=getarray, ext=ext, start=start, stop=stop,
    372                     recursive=recursive, npartitions=npartitions, recount=recount,
--> 373                     labels=labels, engine=engine, credentials=credentials)
    374 
    375 def frompng(path, ext='png', start=None, stop=None, recursive=False, npartitions=None, labels=None, engine=None, credentials=None):

/home/ec2-user/anaconda2/lib/python2.7/site-packages/thunder/images/readers.pyc in frompath(path, accessor, ext, start, stop, recursive, npartitions, dims, dtype, labels, recount, engine, credentials)
    205     else:
    206         if accessor:
--> 207             data = [accessor(d) for d in data]
    208         flattened = list(itertools.chain(*data))
    209         values = [kv[1] for kv in flattened]

/home/ec2-user/anaconda2/lib/python2.7/site-packages/thunder/images/readers.pyc in getarray(idxAndBuf)
    351         fbuf = BytesIO(buf)
    352         tfh = tifffile.TiffFile(fbuf)
--> 353         ary = tfh.asarray()
    354         pageCount = ary.shape[0]
    355         if nplanes is not None:

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in asarray(self, key, series, memmap)
    998             series = 0
    999         if series is not None:
-> 1000             pages = self.series[series].pages
   1001         else:
   1002             pages = self.pages

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in __get__(self, instance, owner)
    712         if instance is None:
    713             return self
--> 714         value = self.func(instance)
    715         if value is NotImplemented:
    716             return getattr(super(owner, instance), self.func.__name__)

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in series(self)
    881 
    882         if self.is_ome:
--> 883             series = self._omeseries()
    884         elif self.is_fluoview:
    885             dims = {b'X': 'X', b'Y': 'Y', b'Z': 'Z', b'T': 'T',

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in _omeseries(self)
   1165                                 tif = TiffFile(os.path.join(dirname, fname))
   1166                             except (IOError, ValueError):
-> 1167                                 tif.close()
   1168                                 warnings.warn(
   1169                                     "ome-xml: failed to read '%s'" % fname)

UnboundLocalError: local variable 'tif' referenced before assignment

Has anyone seen this before? I'm getting it when I try and load locally, or from s3 using both the default engine and spark.

If I knock out the tif.close() statement in tiffile.py I get hundreds of warnings of the form:

anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.py:1169: UserWarning: ome-xml: failed to read '1_2_T{index}.ome.tif'
  "ome-xml: failed to read '%s'" % fname)

and if I manually set stop=1 with engine=sc in the call to .fromtif([...]) I get this:

Traceback (most recent call last):
  File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
    process()
  File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/home/ec2-user/spark/python/pyspark/rdd.py", line 1293, in takeUpToNumLeft
    yield next(iterator)
  File "/home/ec2-user/anaconda2/lib/python2.7/site-packages/thunder/images/readers.py", line 353, in getarray
    ary = tfh.asarray()
  File "/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.py", line 1041, in asarray
    result = numpy.empty(s.shape, s.dtype).reshape(-1)
MemoryError
d-v-b commented 8 years ago

@bainzo is this issue specific to thunder? tifffile.py is also used by scikit-image, so do you get the same kind of error if you try to load your .tif files into scikit-image?

e.g.

import skimage.external.tiffffile as tif
im = tif.imread('image.tif')
bainzo commented 8 years ago

@d-v-b I will try that as soon as I have the opportunity.

The files were recorded in the OME-TIF format and then transformed into stacks of time-points using the Bioformats plugin for ImageJ.

I retested after saving the file as a regular TIF using ImageJ and it worked with the default engine, but it caused Spark to crash. I'll have another look at both later to see if I can work it out...

bainzo commented 8 years ago

I've had more luck batch saving the images as plain old tifs in ImageJ; loading into Thunder using Spark as the engine appears to work fine.

Using tifffile I get the following stack trace when trying to load a single image using PySpark and Jupyter/IPython notebook,

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-3-77d359098e03> in <module>()
----> 1 im = tif.imread('test-data-2016-06-14/2016-02-10/ome/T0.ome.tif')

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in imread(files, **kwargs)
    696     if isinstance(files, basestring):
    697         with TiffFile(files, **kwargs_file) as tif:
--> 698             return tif.asarray(**kwargs)
    699     else:
    700         with TiffSequence(files, **kwargs_seq) as imseq:

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in asarray(self, key, series, memmap)
    998             series = 0
    999         if series is not None:
-> 1000             pages = self.series[series].pages
   1001         else:
   1002             pages = self.pages

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in __get__(self, instance, owner)
    712         if instance is None:
    713             return self
--> 714         value = self.func(instance)
    715         if value is NotImplemented:
    716             return getattr(super(owner, instance), self.func.__name__)

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in series(self)
    881 
    882         if self.is_ome:
--> 883             series = self._omeseries()
    884         elif self.is_fluoview:
    885             dims = {b'X': 'X', b'Y': 'Y', b'Z': 'Z', b'T': 'T',

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in _omeseries(self)
   1163                             fname = uuid.attrib['FileName']
   1164                             try:
-> 1165                                 tif = TiffFile(os.path.join(dirname, fname))
   1166                             except (IOError, ValueError):
   1167                                 tif.close()

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in __init__(self, arg, name, offset, size, multifile, multifile_close)
    777         self._files = {self._fh.name: self}  # cache of TiffFiles
    778         try:
--> 779             self._fromfile()
    780         except Exception:
    781             self._fh.close()

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in _fromfile(self)
    818         while True:
    819             try:
--> 820                 page = TiffPage(self)
    821                 self.pages.append(page)
    822             except StopIteration:

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in __init__(self, parent)
   1355         self.tags = TiffTags()
   1356 
-> 1357         self._fromfile()
   1358         self._process_tags()
   1359 

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in _fromfile(self)
   1389         for _ in range(numtags):
   1390             try:
-> 1391                 tag = TiffTag(self.parent)
   1392                 # print(tag)
   1393             except TiffTag.Error as e:

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in __init__(self, arg, **kwargs)
   2097         self._offset = None
   2098         if hasattr(arg, '_fh'):
-> 2099             self._fromfile(arg, **kwargs)
   2100         else:
   2101             self._fromdata(arg, **kwargs)

/home/ec2-user/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.pyc in _fromfile(self, parent)
   2154                     value = Record(value)
   2155             elif code in TIFF_TAGS or dtype[-1] == 's':
-> 2156                 value = struct.unpack(fmt, fh.read(size))
   2157             else:
   2158                 value = read_numpy(fh, byteorder, dtype, count)

error: unpack requires a string argument of length 1597178

Is it possible the the large OME-XML headers being generated by MicroManager is overloading something?