Open oparoz opened 8 years ago
@oparoz, I have looked at the data fields in all the three types of data and decided to go with the following fields:
We can add/remove based on our requirements as we go ahead in the project.
According to me, we can extract the following fields from EXIF Data: (I am taking reference from here)
@oparoz What are your views?
Thank you for the link. Overall, it's not easy to pick the essential ones as this app is supposed to be generic and not linked to Gallery, but I did ask you to cross-reference each field to see if they exist in all 3 sets since this would help pick the ones we have to have. So maybe start with that and then we can look at the details?
Exif.Image.ImageLength (or Exif.Iop.RelatedImageLength) Though, I have already written the code for this, it is indirectly taken from exif/iptc data itself.
I'm not sure as it's GD which is extracting the info, so it's probably using a different methodology since very few formats include metadata
Regarding image dimensions, I think it's safer to stick with the current way of doing things as it's universal. Otherwise we would have to do it twice or pick a method of extraction based on the format which adds complexity. Maybe something for later?
Exif.Image.HostComputer (Maybe we can get the owner info from this, will have to check)
I think this would be considered to invasive, let's put it on the "maybe" pile and once we have a list we could ask people for their opinion.
Same with Exif.Image.HostComputer
Exif.Image.ImageID (I think it may be useful, according to what is given in the description in the link I have provided)
On the maybe pile?
Exif.Photo.SubSecTime, Exif.Photo.SubSecTimeOriginal (To get the fraction of seconds for the respective times)
Probably unnecessary.
Looking at the list, it sounds like date and location are the 2 main areas and those are definitely needed.
But it seems some pretty obvious ones are missing
I'm not sure as it's GD which is extracting the info, so it's probably using a different methodology since very few formats include metadata
Regarding image dimensions, I think it's safer to stick with the current way of doing things as it's universal. Otherwise we would have to do it twice or pick a method of extraction based on the format which adds complexity. Maybe something for later?
Actually, I had checked it out and I saw the function that I was using getimagesize
and found out that they are extracting it from EXIF/IPTC data itself. (I will let you know the file) But I too don't think there is any need to change the current method.
I agree with the other points you have made.
But it seems some pretty obvious ones are missing •Exif.Image.ImageDescription •Exif.Photo.UserComment •Exif.Image.Make and Exif.Image.Model •Exif.Image.Orientation
Actually, I forgot to mention about these. I thought first I would finish with size, location and time and then come to these fields. On the other hand, 100% We need to extract the above mentioned fields.
Also, the aim of matching the fields in the three sets is to not have any repetitions. right?
Also, coming to the generic part of the App, we can always add any additional fields which we may want at any time in the future.
Actually, I had checked it out and I saw the function that I was using getimagesize and found out that they are extracting it from EXIF/IPTC data itself.
Looking at your code, it's probably because you don't do any checks on what you send to getimagesize (and the reason tests from Gallery fail when you app is activated). So if you only test with JPEG, then you're going to have EXIF metadata.
I thought first I would finish with size, location and time and then come to these fields
Yes, it's not a problem, we just need to have maybe 3 priorities for the implementation. It should take very little time to add ways to extract new fields if the methods are properly designed.
Also, the aim of matching the fields in the three sets is to not have any repetitions. right?
Not, it would be to be able to extract the same information from all 3 formats.
So if we want size, date and location, that would be 3-4 DB fields matching specific implementation by each format. Does that make sense?
Looking at your code, it's probably because you don't do any checks on what you send to getimagesize (and the reason tests from Gallery fail when you app is activated). So if you only test with JPEG, then you're going to have EXIF metadata.
I didn't get you.
As getimagesize function is a GD function, I was just trying to point out that the place where this function is defined it is mentioned that the dimensions are taken from EXIF/IPTC data. (I think I had seen this when I was trying to find how it is being done, though I would check it again)
Also, regarding the tests, as far as I remember, I had checked running the tests by disabling this app but that still gave an error, I think I had told you about this. I am not sure whether there was some problem in my local machine or some where else.
So if we want size, date and location, that would be 3-4 DB fields matching specific implementation by each format. Does that make sense?
Yes.
As getimagesize function is a GD function, I was just trying to point out that the place where this function is defined it is mentioned that the dimensions are taken from EXIF/IPTC data. (I think I had seen this when I was trying to find how it is being done, though I would check it again)
OK, so
That's the reason you can't use those tags to retrieve the size of an image as we need to be able to get the size of images in many different formats.
Is it clearer?
Also, regarding the tests, as far as I remember, I had checked running the tests by disabling this app but that still gave an error
This is what you've sent me when putting in place Codeception
[PHPUnit_Framework_Exception]
getimagesize(): Read error!
That's the problem I'm talking about.
You can write the 1st test with a JPG and once it passes, you'll need to create another one which involves processing a txt file and then update your code to make it work. But that's all for another issue as this one focuses on identifying the tags.
OK, so
- GD can only be used on image formats it supports. If you send a PDF, your app will crash
- A PNG doesn't contain any EXIF or IPTC meta data, so GD can't use that to determine the size of the image. Instead it relies on libraries designed to handle that format
That's the reason you can't use those tags to retrieve the size of an image as we need to be able to get the size of images in many different formats.
Is it clearer?
Yeah, I had read about that. Thanks for reminding.
Thanks, I understood the tests problem.
IPTC Tags that represent similar data to EXIF: (Only for Location and Time)
@oparoz Please give your suggestions.
Reference Link for IPTC.
Similar XMP Tags:
Reference:
The following table lists the XMP properties defined solely by Exif.
The following table lists additional XMP properties defined solely by Exif.
This schema specifies the IPTC Core XMP properties.
This schema specifies the IPTC Extension XMP properties.
OK, but now you need to prioritise these groups and then put them in a table.
Priority | DB | EXIF | IPTC | XMP |
---|---|---|---|---|
1 | creation_date |
@oparoz I will come up with my table by today and put it over here and then we can discuss on the changes.
@oparoz Can you also explain to me how exactly should I select my priority order? I mean what factors should I keep in mind?
Since your question was answered via email, I'm looking forward to your prioritised table.
Another resource for XMP data: http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html
@oparoz I have made a table of data fields with priority values. You can find it over here.
Good job :)
Could you order it, so that people looking at it get the proper order right away?
Looks like Caption and Description are the same thing, no?
I don't think we need the EXIF GPS info as priority 3. We have the coordinates as priority 1 and that should be enough.
Overall, it seems like IPTC is going to be the one holding us back.
Could you please add all the references you've added to this issue to the Wiki?
@oparoz,
Could you order it, so that people looking at it get the proper order right away?
Yeah I have done it.
Looks like Caption and Description are the same thing, no?
Yeah, I also had that doubt. I will merge them while coding.
I don't think we need the EXIF GPS info as priority 3. We have the coordinates as priority 1 and that should be enough.
Done.
Could you please add all the references you've added to this issue to the Wiki?
Okay, I will add them soon.
Looks like "location" is going to be a problem. City is too vague to be able to put a picture on a map.
@oparoz I think we will be able to put the picture on the map with the help of coordinates itself. I have earlier used the google maps api for GIS with which we can mark a position on the map with the coordinates. On the other hand, we can use the city as a tag for the pictures, so that if we search a city, we get all the pictures clicked there. What do you think about this?
Yes, no problem with coordinates, but with IPTC tags, there is not enough granularity, which means that if you're visiting a city, all the pictures will be piled up in one location instead of being spread out, near their real locations.
Okay, I will try to find a way out and get back to you.
Maybe we'll have 2 DB fields. gps_coordinates
and location
. Apps can use the latter as a fallback when the former is empty.
Which reminds me that you need to consolidate the DB fields. We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.
Regarding IPTC GPS fields, I got one more tag in IPTC that is the PostalCode and as far as I know, we can get the approximate location using the google maps API using the postal code too. What do you think about it?
This is the reference link
I still think it's not as precise as GPS coordinates, but a good fallback
Yeah, not as good as GPS coordinates but better than just city.
Indeed :)
Also, there is one more field named sublocation as well, we can look at that too, when I am implementing it.
We'll have to test on a large sample of images to figure out what the best approach is. A good idea would be to get a list of IPTC data generators to see what gets inserted and how.
@oparoz I am planning to update the database.xml file with the final fields along with finishing that test. So can you finalize the fields in that google sheet? As soon as I complete the database.xml, I will be able to start with the extraction part and I am planning to complete at least the EXIF extraction by Friday(or Saturday).
@imjalpreet - Did you complete this task?
We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.
I'd like to avoid having 4 fields for GPS data if not necessary
Some other things.
This is still not fixed
Looks like Caption and Description are the same thing, no?
The goal is to be able to present this to users and get some feedback to see if that's what people want
@oparoz I have done all the changes you asked.
@imjalpreet - Did you complete this task?
We need one field for the GPS location or maybe 2 if we go with latitude,longitude. The best thing to do is to use best practices (Google, Twitter, etc.) since people are already used to those APIs.
What exactly do you want me to do in this? Should I find out what practices are used in Google and Twitter for GPS location?
Should I find out what practices are used in Google and Twitter for GPS location?
Exactly. Look at their APIs and see what they return to clients asking for a piece of information, like a tweet or Facebook update, etc.
Exactly. Look at their APIs and see what they return to clients asking for a piece of information, like a tweet or Facebook update, etc.
Okay, I will have a look at it.
Thanks!
@oparoz This is the response I get from the Facebook graph API if for example I request the current location of one of my friend:
"current_location": {
"city": "Mumbai",
"state": "Maharashtra",
"country": "India",
"zip": "",
"latitude": 18.975,
"longitude": 72.8258,
"id": "114759761873412",
"name": "Mumbai, India"
}
What are your views on this?
Try to get a few more, like Twitter, but it seems like latitude and longitude is all we would need.
@oparoz I looked out and found out that Twitter also uses only latitude and longitude for the location.
The JSON object returned by the twitter API is of this form:
"geo": { "type":"Point", "coordinates":[37.78029, -122.39697] }
Resource: Link
So, I think we can go forward with latitude and longitude only.
OK, let's go with that
@oparoz Okay, so should I update the database.xml file with the finalized fields?
@oparoz Do you want any more changes?
There is still nothing in the OP or the wiki. We need to be able to quickly find the information. Please do that ASAP, then we can ask people for their opinion, then we can update the fields.
@oparoz Can you give some brief info on what all should I add to the Wiki?
Sure. Right now we need one page with the links to all the description of the fields we can pick from. In this OP, you can add a link to the wiki and a link to your spreadsheet.
As a general rule, when you find something useful, just add it to the wiki. It can be the code in core that you used as a reference per example, making it easy for someone else to quickly find some sort of reference document.
The spreadsheet needs to be fixed to reflect your latest findings. We only need 2 fields for the GPS coordinates.
We need to have a full list and prioritise it.
There should be a list for EXIF, IPTC and XMP