jauharshaikh / metadata-extractor

Automatically exported from code.google.com/p/metadata-extractor
0 stars 0 forks source link

Does not work on Google App Engine #33

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Problem occurs in the JpegSegmentReader

Specifically private boolean checkForBytesOnStream

(bytesNeeded <= stream.available()) always returns true.

The code I am using is:
BlobstoreInputStream inputStream = new BlobstoreInputStream(new 
BlobKey(image.blobKey))
Object metadata = ImageMetadataReader.readMetadata(new 
BufferedInputStream(inputStream), true);

Is it possible to provide an alternative method readMetadata that takes a 
regular InputStream?

I have no problem getting Apache Sanselan to work:

BlobstoreInputStream inputStream = new BlobstoreInputStream(new 
BlobKey(image.blobKey))
    IImageMetadata metadata = Sanselan.getMetadata(inputStream, null)
    JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
    if (jpegMetadata != null) {
        Object metadata = jpegMetadata.getExif();
    }

Original issue reported on code.google.com by sc...@pixoto.com on 30 Jul 2011 at 10:25

GoogleCodeExporter commented 8 years ago
In this line:

ImageMetadataReader.readMetadata(new BufferedInputStream(inputStream), true);

Try passing 'false' instead of 'true'.  That should mean it no longer tries to 
wait for bytes on the stream, which will avoid the exception you're seeing.  I 
don't believe that changing the stream type to InputStream directly will have 
any effect here.

Let me know if that doesn't fix the problem for you.

Original comment by drewnoakes on 30 Jul 2011 at 11:02

GoogleCodeExporter commented 8 years ago
No, that does not fix the problem because 
JpegSegmentReader.checkForBytesOnStream will still return false anyhow:

        if (!waitForBytes)
            return bytesNeeded <= stream.available();

None the less, I tried it and got the same exception:

com.drew.imaging.jpeg.JpegProcessingException: segment size would extend beyond 
file stream length
    at com.drew.imaging.jpeg.JpegSegmentReader.readSegments(Unknown Source)
    at com.drew.imaging.jpeg.JpegSegmentReader.<init>(Unknown Source)
    at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(Unknown Source)
    at com.drew.imaging.ImageMetadataReader.readMetadata(Unknown Source)
    at com.drew.imaging.ImageMetadataReader.readMetadata(Unknown Source)
    at com.drew.imaging.ImageMetadataReader$readMetadata.call(Unknown Source)

Original comment by sc...@pixoto.com on 30 Jul 2011 at 11:52

GoogleCodeExporter commented 8 years ago
Thanks for trying that. Do you see that exception if you process the image from 
the filesystem directly? 

I haven't used google app engine myself, but I believe animaps (google it) use 
this library successfully in such an environment.

Are you able to preload the stream into memory?  If you can find a patch that 
fixes this issue, it would be ideal, as I can't reproduce it with the 
information provided. 

Original comment by drewnoakes on 31 Jul 2011 at 12:11

GoogleCodeExporter commented 8 years ago
So if I load everything in memory by calling inputStream.getBytes() and then 
creating a ByteArrayInputStream like the following:

ImageMetadataReader.readMetadata(new BufferedInputStream(new 
ByteArrayInputStream(inputStream.getBytes())), false);

it works, but this is probably not a good long term solution?

Correct me if I am wrong, but you don't need to read the entire file into 
memory in order to just extract the metadata?

Original comment by sc...@pixoto.com on 31 Jul 2011 at 12:56

GoogleCodeExporter commented 8 years ago
So I noticed that you don't even use any features of BufferedInputStream in 
your code.  Why is it that you pass a BufferedInputStream around?

After making the following changes everything works fine:

Change ImageMetadataReader.java to use InputStream instead of 
BufferedInputStream:
    public static Metadata readMetadata(@NotNull InputStream inputStream, boolean waitForBytes) throws ImageProcessingException, IOException
    private static Metadata readMetadata(@Nullable InputStream inputStream, @Nullable File file, int magicNumber, boolean waitForBytes) throws ImageProcessingException, IOException

Change JpegSegmentReader.java to use InputStream instead of BufferedInputStream:
public JpegSegmentReader(@NotNull InputStream inputStream, boolean 
waitForBytes) throws JpegProcessingException
    private JpegSegmentData readSegments(@NotNull final InputStream jpegInputStream, boolean waitForBytes) throws JpegProcessingException

lastly if waitForBytes is false, I force checkForBytesOnStream to return true

perhaps this was a bug anyhow?
        if (!waitForBytes)
            return bytesNeeded <= stream.available();

because bytesNeeded <= stream.available() will return false every time if you 
are not waiting for the bytes and the calling code will throw an exception.

So I changed it to:
        if (!waitForBytes)
            return true;

Original comment by sc...@pixoto.com on 31 Jul 2011 at 1:23

GoogleCodeExporter commented 8 years ago
You mise well change your code to use an InputStream instead of a 
BufferedInputStream because it will still be backward compatible as a 
BufferedInputStream is an InputStream and you don't do anything that requires 
one.

That way you can leave it up to the user what type of stream they want to use.

Original comment by sc...@pixoto.com on 31 Jul 2011 at 1:25

GoogleCodeExporter commented 8 years ago
No this isn't a great idea, but it does rule out any kind of error in the image 
itself. 

I'm on my phone right now but should be back at my dev machine next week to try 
what you're seeing. If you learn anything before then, pease let me know.

Original comment by drewnoakes on 31 Jul 2011 at 1:25

GoogleCodeExporter commented 8 years ago
My previous comment was a reply to your #4.

Your changes look very sensible. I'll test them out once I'm at my dev machine. 

Thanks for your efforts and feedback.

Original comment by drewnoakes on 31 Jul 2011 at 1:30

GoogleCodeExporter commented 8 years ago
Looking at this now, it seems that ImageMetadataReader does in fact require the 
mark/reset functionality of the BufferedInputStream.  However I can probably 
work around this.

Original comment by drewnoakes on 22 May 2012 at 10:43

GoogleCodeExporter commented 8 years ago
Just a note that this also seems to happen with files hosted in S3, with the 
app hosted on Heroku.

Original comment by ibex...@gmail.com on 25 May 2012 at 12:03

GoogleCodeExporter commented 8 years ago
Relates to http://code.google.com/p/metadata-extractor/issues/detail?id=27

Original comment by drewnoakes on 16 Oct 2012 at 4:23

GoogleCodeExporter commented 8 years ago
A fix for this is in place and is undergoing testing. It's involved some 
significant API changes, and so will be made available in the next major 
release, 2.7.0.

For now it's available on a feature branch.

Original comment by drewnoakes on 29 Oct 2012 at 2:05

GoogleCodeExporter commented 8 years ago
Any news on this? I'd love to use your library but am reading from S3 where I 
hit this problem. Since there was no news since I assume nobody found problems 
with the feature branch?

Original comment by wagner.d...@gmail.com on 28 Dec 2012 at 1:16

GoogleCodeExporter commented 8 years ago
Hi Daniel,

I haven't done any testing specifically against Google App Engine or S3, but 
the problem as described above has been addressed on the feature branch.

I've been working on some more changes to be included in the 2.7.0 on the 
feature/plugins branch. Yesterday I completed the first version of this which 
you're welcome to try out. I've attached a zip containing the JAR. This version 
allows you to specify particular metadata processors (currently only for JPEG 
files). In this way, if you're only interested in, say, Exif data, then you can 
skip loading and parsing of all other JPEG segments, reducing IO, memory and 
CPU usage.

Original comment by drewnoakes on 28 Dec 2012 at 3:14

Attachments:

GoogleCodeExporter commented 8 years ago
This is looking great from my end. Works flawlessly on S3!
I didn't find how to specify the metadata processors you mentioned, but only 
gave it a quick look. They'd be interesting for my use case though, as I don't 
need to process the entire picture and would be happy to close the stream as 
soon as I have EXIF data. If you have a pointer for how to use them then I'd be 
curious to try them out.

Original comment by wagner.d...@gmail.com on 29 Dec 2012 at 3:03

GoogleCodeExporter commented 8 years ago
Glad to hear this issue doesn't occur for you on S3. I'd certainly hoped that 
was the case, given the problem description and changes made since.

I'll document the new API once I've finalised it. Your feedback would be 
helpful. In the meantime, you can use this code. It may change a little before 
the release of 2.7.0, but not by much and I can help out if needed.

    //
    // Build an Iterable of JPEG segment readers
    //
    // In this example, only ExifReader is used
    //
    // Others include JpegReader, JpegCommentReader, JfifReader, XmpReader
    // IccReader, PhotoshopReader, IptcReader, AdobeJpegReader
    //
    Iterable<JpegSegmentMetadataReader> readers = Arrays.asList(new ExifReader());

    //
    // Extract metadata pretty much as before, but specify your readers
    //
    Metadata metadata = JpegMetadataReader.readMetadata(inputStream, readers);

Original comment by drewnoakes on 29 Dec 2012 at 4:37

GoogleCodeExporter commented 8 years ago
One more thing to note -- I will probably change the names of these readers. 
Currently the noun 'reader' is used in too many cases. I'm giving the new API 
some thought, and your feedback is very welcome. The names should help separate 
the main conceptual elements in the library, which they currently don't do very 
well.

Original comment by drewnoakes on 29 Dec 2012 at 4:39

GoogleCodeExporter commented 8 years ago
Hi, new to this library and run into the same issue. Where can I get the patch? 
I checked out git repository but find only the master branch. There, I do see 
the ImageMetadata.readMetadata() takes InputStream. Is it the right one to use?

Thanks!

Original comment by winson.q...@gmail.com on 8 Apr 2013 at 5:00