interfasys / mediametadata

A cloud application which provides CRUD access to the metadata stored in images
GNU Affero General Public License v3.0
5 stars 1 forks source link

List all possible fields which can be extracted #5

Open oparoz opened 8 years ago

oparoz commented 8 years ago

We need to have a full list and prioritise it.

There should be a list for EXIF, IPTC and XMP

imjalpreet commented 8 years ago

@oparoz, I have looked at the data fields in all the three types of data and decided to go with the following fields:

IPTC Data:

EXIF Data:

XMP Data:

We can add/remove based on our requirements as we go ahead in the project.

imjalpreet commented 8 years ago

According to me, we can extract the following fields from EXIF Data: (I am taking reference from here)

@oparoz What are your views?

oparoz commented 8 years ago

Thank you for the link. Overall, it's not easy to pick the essential ones as this app is supposed to be generic and not linked to Gallery, but I did ask you to cross-reference each field to see if they exist in all 3 sets since this would help pick the ones we have to have. So maybe start with that and then we can look at the details?

Exif.Image.ImageLength (or Exif.Iop.RelatedImageLength) Though, I have already written the code for this, it is indirectly taken from exif/iptc data itself.

I'm not sure as it's GD which is extracting the info, so it's probably using a different methodology since very few formats include metadata

Regarding image dimensions, I think it's safer to stick with the current way of doing things as it's universal. Otherwise we would have to do it twice or pick a method of extraction based on the format which adds complexity. Maybe something for later?

Exif.Image.HostComputer (Maybe we can get the owner info from this, will have to check)

I think this would be considered to invasive, let's put it on the "maybe" pile and once we have a list we could ask people for their opinion. Same with Exif.Image.HostComputer

Exif.Image.ImageID (I think it may be useful, according to what is given in the description in the link I have provided)

On the maybe pile?

Exif.Photo.SubSecTime, Exif.Photo.SubSecTimeOriginal (To get the fraction of seconds for the respective times)

Probably unnecessary.

Looking at the list, it sounds like date and location are the 2 main areas and those are definitely needed.

But it seems some pretty obvious ones are missing

imjalpreet commented 8 years ago

I'm not sure as it's GD which is extracting the info, so it's probably using a different methodology since very few formats include metadata

Regarding image dimensions, I think it's safer to stick with the current way of doing things as it's universal. Otherwise we would have to do it twice or pick a method of extraction based on the format which adds complexity. Maybe something for later?

Actually, I had checked it out and I saw the function that I was using getimagesize and found out that they are extracting it from EXIF/IPTC data itself. (I will let you know the file) But I too don't think there is any need to change the current method.

I agree with the other points you have made.

But it seems some pretty obvious ones are missing •Exif.Image.ImageDescription •Exif.Photo.UserComment •Exif.Image.Make and Exif.Image.Model •Exif.Image.Orientation

Actually, I forgot to mention about these. I thought first I would finish with size, location and time and then come to these fields. On the other hand, 100% We need to extract the above mentioned fields.

Also, the aim of matching the fields in the three sets is to not have any repetitions. right?

imjalpreet commented 8 years ago

Also, coming to the generic part of the App, we can always add any additional fields which we may want at any time in the future.

oparoz commented 8 years ago

Actually, I had checked it out and I saw the function that I was using getimagesize and found out that they are extracting it from EXIF/IPTC data itself.

Looking at your code, it's probably because you don't do any checks on what you send to getimagesize (and the reason tests from Gallery fail when you app is activated). So if you only test with JPEG, then you're going to have EXIF metadata.

I thought first I would finish with size, location and time and then come to these fields

Yes, it's not a problem, we just need to have maybe 3 priorities for the implementation. It should take very little time to add ways to extract new fields if the methods are properly designed.

Also, the aim of matching the fields in the three sets is to not have any repetitions. right?

Not, it would be to be able to extract the same information from all 3 formats.

So if we want size, date and location, that would be 3-4 DB fields matching specific implementation by each format. Does that make sense?

imjalpreet commented 8 years ago

Looking at your code, it's probably because you don't do any checks on what you send to getimagesize (and the reason tests from Gallery fail when you app is activated). So if you only test with JPEG, then you're going to have EXIF metadata.

I didn't get you.

As getimagesize function is a GD function, I was just trying to point out that the place where this function is defined it is mentioned that the dimensions are taken from EXIF/IPTC data. (I think I had seen this when I was trying to find how it is being done, though I would check it again)

Also, regarding the tests, as far as I remember, I had checked running the tests by disabling this app but that still gave an error, I think I had told you about this. I am not sure whether there was some problem in my local machine or some where else.

So if we want size, date and location, that would be 3-4 DB fields matching specific implementation by each format. Does that make sense?

Yes.

oparoz commented 8 years ago

As getimagesize function is a GD function, I was just trying to point out that the place where this function is defined it is mentioned that the dimensions are taken from EXIF/IPTC data. (I think I had seen this when I was trying to find how it is being done, though I would check it again)

OK, so

  1. GD can only be used on image formats it supports. If you send a PDF, your app will crash
  2. A PNG doesn't contain any EXIF or IPTC meta data, so GD can't use that to determine the size of the image. Instead it relies on libraries designed to handle that format

That's the reason you can't use those tags to retrieve the size of an image as we need to be able to get the size of images in many different formats.

Is it clearer?

Also, regarding the tests, as far as I remember, I had checked running the tests by disabling this app but that still gave an error

This is what you've sent me when putting in place Codeception

[PHPUnit_Framework_Exception]  
 getimagesize(): Read error!

That's the problem I'm talking about.

You can write the 1st test with a JPG and once it passes, you'll need to create another one which involves processing a txt file and then update your code to make it work. But that's all for another issue as this one focuses on identifying the tags.

imjalpreet commented 8 years ago

OK, so

  1. GD can only be used on image formats it supports. If you send a PDF, your app will crash
  2. A PNG doesn't contain any EXIF or IPTC meta data, so GD can't use that to determine the size of the image. Instead it relies on libraries designed to handle that format

That's the reason you can't use those tags to retrieve the size of an image as we need to be able to get the size of images in many different formats.

Is it clearer?

Yeah, I had read about that. Thanks for reminding.

Thanks, I understood the tests problem.

imjalpreet commented 8 years ago

IPTC Tags that represent similar data to EXIF: (Only for Location and Time)

@oparoz Please give your suggestions.

imjalpreet commented 8 years ago

Reference Link for IPTC.

imjalpreet commented 8 years ago

Similar XMP Tags:


Reference:

The following table lists the XMP properties defined solely by Exif.

The following table lists additional XMP properties defined solely by Exif.

This schema specifies the IPTC Core XMP properties.

This schema specifies the IPTC Extension XMP properties.

oparoz commented 8 years ago

OK, but now you need to prioritise these groups and then put them in a table.

Priority DB EXIF IPTC XMP
1 creation_date
imjalpreet commented 8 years ago

@oparoz I will come up with my table by today and put it over here and then we can discuss on the changes.

imjalpreet commented 8 years ago

@oparoz Can you also explain to me how exactly should I select my priority order? I mean what factors should I keep in mind?

oparoz commented 8 years ago

Since your question was answered via email, I'm looking forward to your prioritised table.

imjalpreet commented 8 years ago

Another resource for XMP data: http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html

imjalpreet commented 8 years ago

@oparoz I have made a table of data fields with priority values. You can find it over here.

oparoz commented 8 years ago

Good job :)

Could you order it, so that people looking at it get the proper order right away?

Looks like Caption and Description are the same thing, no?

I don't think we need the EXIF GPS info as priority 3. We have the coordinates as priority 1 and that should be enough.

Overall, it seems like IPTC is going to be the one holding us back.

Could you please add all the references you've added to this issue to the Wiki?

imjalpreet commented 8 years ago

@oparoz,

Could you order it, so that people looking at it get the proper order right away?

Yeah I have done it.

Looks like Caption and Description are the same thing, no?

Yeah, I also had that doubt. I will merge them while coding.

I don't think we need the EXIF GPS info as priority 3. We have the coordinates as priority 1 and that should be enough.

Done.

Could you please add all the references you've added to this issue to the Wiki?

Okay, I will add them soon.

oparoz commented 8 years ago

Looks like "location" is going to be a problem. City is too vague to be able to put a picture on a map.

imjalpreet commented 8 years ago

@oparoz I think we will be able to put the picture on the map with the help of coordinates itself. I have earlier used the google maps api for GIS with which we can mark a position on the map with the coordinates. On the other hand, we can use the city as a tag for the pictures, so that if we search a city, we get all the pictures clicked there. What do you think about this?

oparoz commented 8 years ago

Yes, no problem with coordinates, but with IPTC tags, there is not enough granularity, which means that if you're visiting a city, all the pictures will be piled up in one location instead of being spread out, near their real locations.

imjalpreet commented 8 years ago

Okay, I will try to find a way out and get back to you.

oparoz commented 8 years ago

Maybe we'll have 2 DB fields. gps_coordinates and location. Apps can use the latter as a fallback when the former is empty.

oparoz commented 8 years ago

Which reminds me that you need to consolidate the DB fields. We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.

imjalpreet commented 8 years ago

Regarding IPTC GPS fields, I got one more tag in IPTC that is the PostalCode and as far as I know, we can get the approximate location using the google maps API using the postal code too. What do you think about it?

This is the reference link

oparoz commented 8 years ago

I still think it's not as precise as GPS coordinates, but a good fallback

imjalpreet commented 8 years ago

Yeah, not as good as GPS coordinates but better than just city.

oparoz commented 8 years ago

Indeed :)

imjalpreet commented 8 years ago

Also, there is one more field named sublocation as well, we can look at that too, when I am implementing it.

oparoz commented 8 years ago

We'll have to test on a large sample of images to figure out what the best approach is. A good idea would be to get a list of IPTC data generators to see what gets inserted and how.

imjalpreet commented 8 years ago

@oparoz I am planning to update the database.xml file with the final fields along with finishing that test. So can you finalize the fields in that google sheet? As soon as I complete the database.xml, I will be able to start with the extraction part and I am planning to complete at least the EXIF extraction by Friday(or Saturday).

oparoz commented 8 years ago

@imjalpreet - Did you complete this task?

We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.

I'd like to avoid having 4 fields for GPS data if not necessary

oparoz commented 8 years ago

Some other things.

oparoz commented 8 years ago

The goal is to be able to present this to users and get some feedback to see if that's what people want

imjalpreet commented 8 years ago

@oparoz I have done all the changes you asked.

@imjalpreet - Did you complete this task?

We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.

What exactly do you want me to do in this? Should I find out what practices are used in Google and Twitter for GPS location?

oparoz commented 8 years ago

Should I find out what practices are used in Google and Twitter for GPS location?

Exactly. Look at their APIs and see what they return to clients asking for a piece of information, like a tweet or Facebook update, etc.

imjalpreet commented 8 years ago

Exactly. Look at their APIs and see what they return to clients asking for a piece of information, like a tweet or Facebook update, etc.

Okay, I will have a look at it.

oparoz commented 8 years ago

Thanks!

imjalpreet commented 8 years ago

@oparoz This is the response I get from the Facebook graph API if for example I request the current location of one of my friend:

"current_location": {
        "city": "Mumbai",
        "state": "Maharashtra",
        "country": "India",
        "zip": "",
        "latitude": 18.975,
        "longitude": 72.8258,
        "id": "114759761873412",
        "name": "Mumbai, India"
      }

What are your views on this?

oparoz commented 8 years ago

Try to get a few more, like Twitter, but it seems like latitude and longitude is all we would need.

imjalpreet commented 8 years ago

@oparoz I looked out and found out that Twitter also uses only latitude and longitude for the location. The JSON object returned by the twitter API is of this form: "geo": { "type":"Point", "coordinates":[37.78029, -122.39697] }

Resource: Link

So, I think we can go forward with latitude and longitude only.

oparoz commented 8 years ago
                                                                                  OK, let's go with that
imjalpreet commented 8 years ago

@oparoz Okay, so should I update the database.xml file with the finalized fields?

imjalpreet commented 8 years ago

@oparoz Do you want any more changes?

oparoz commented 8 years ago

There is still nothing in the OP or the wiki. We need to be able to quickly find the information. Please do that ASAP, then we can ask people for their opinion, then we can update the fields.

imjalpreet commented 8 years ago

@oparoz Can you give some brief info on what all should I add to the Wiki?

oparoz commented 8 years ago

Sure. Right now we need one page with the links to all the description of the fields we can pick from. In this OP, you can add a link to the wiki and a link to your spreadsheet.

As a general rule, when you find something useful, just add it to the wiki. It can be the code in core that you used as a reference per example, making it easy for someone else to quickly find some sort of reference document.

oparoz commented 8 years ago

The spreadsheet needs to be fixed to reflect your latest findings. We only need 2 fields for the GPS coordinates.