Open fbuchinger opened 8 years ago
Great idea. I would love a pull request! The size of the test images has always bugged me. I tried to separate it from the code repo but the checkout was still huge unless you used the --depth 1 flag when checking out because they are forever in the history. I recently moved them to a submodule but that means you get all the files anyway. I also never go the nodejs dev-dependencies thing working in the way that I wanted it. Similarly, the test suite should be much faster to be useful.
In test/sampleImages/_Other
are a bunch of images that I have encountered that have caused problems for this library. It serves as a very hacky regression test suite. Most of it isn't about handling new tags, but more about handling strange corner-cases such as incorrect encoding, or weird corruptions caused by rouge software. I would initially like this set of images to stay. The correct thing would have been for me to write a single test case for each class of bug as I'm sure they are not all needed. Something like this:
describe('For regression suite images', function () {
it('should parse corrupted serialNumber', function (done) {
exiftoolJS.getExifFromLocalFileUsingNodeFs(fs, 'test/sampleImages/_Other/x.jpg',
function (err, exif) {
assert.equal('012345', exif.SerialNumber);
done();
}
);
});
});
Just coded a little node script that implements the "one sample image per decodable tag" approach described above.
My Results are promising: if you want just one sample image per tag, you would only have to include the following 212 samples, which contain 3695 different tags in total (assuming 50 kb per image, this would mean ~ 10 MB for all sample images instead of currently 6245 samples consuming 168 MB):
Agfa/Agfa_ePhotoCL30.jpg
Apple/Apple_iPhone5.jpg
Apple/Apple_iPhone5s.jpg
Canon/CanonEOS-1D.jpg
Canon/CanonEOS-1DS.jpg
Canon/CanonEOS-1D_MarkIII.jpg
Canon/CanonEOS-1D_MarkIV.jpg
Canon/CanonEOS-1D_X.jpg
Canon/CanonEOS-1DmkII.jpg
Canon/CanonEOS-1DmkIIN.jpg
Canon/CanonEOS10D.jpg
Canon/CanonEOS300D.jpg
Canon/CanonEOS5D.jpg
Canon/CanonEOS5D_MarkII.jpg
Canon/CanonEOS5D_MarkIII.jpg
Canon/CanonEOS60D.jpg
Canon/CanonEOS70D.jpg
Canon/CanonEOS_D2000.jpg
Canon/CanonEOS_D60.jpg
Canon/CanonEOS_KissX4.jpg
Canon/CanonEOS_KissX5.jpg
Canon/CanonEOS_KissX50.jpg
Canon/CanonIXUS125HS.jpg
Canon/CanonPowerShot600.jpg
Canon/CanonPowerShotA5.jpg
Casio/CasioEX-H20G.jpg
Casio/CasioEX-TR150.jpg
Casio/CasioEX-Z5.jpg
Casio/CasioEX-ZR1000.jpg
Casio/CasioQV-2400UX.jpg
Casio/CasioQV-5700.jpg
Casio/CasioQV-7000SX.jpg
DoCoMo/DoCoMoSH902i.jpg
Epson/EpsonL-300.jpg
Epson/EpsonPerfection4990.jpg
FLIR/FLIR_E30bx.jpg
FLIR/FLIR_E4.jpg
FLIR/FLIR_E40.jpg
FLIR/FLIR_E60.jpg
FLIR/FLIR_P60NTSC.jpg
FujiFilm/FujiFilmFinePixAX550.jpg
FujiFilm/FujiFilmFinePixHS33EXR.jpg
FujiFilm/FujiFilmFinePixS200EXR.jpg
FujiFilm/FujiFilmFinePixS5700S700.jpg
FujiFilm/FujiFilmX-A1.jpg
FujiFilm/FujiFilmXQ1.jpg
FujiFilm/FujiSLP1000SE.jpg
GE/GE_A835.jpg
GE/GE_E1250TW.jpg
HP/HP_PhotoSmart210.jpg
HP/HP_PhotoSmart43xseries.jpg
HP/HP_PhotoSmart618.jpg
HP/HP_PhotoSmartR725.jpg
HP/HP_PhotosmartMz67.jpg
HP/HP_PhotosmartR967.jpg
HP/HP_iPAQ_VoiceMessenger.jpg
Kodak/Kodak1640FilmScanner.jpg
Kodak/KodakDC3400.jpg
Kodak/KodakEASYSHARE_C140.jpg
Kodak/KodakEASYSHARE_CX6230Zoom.jpg
Kodak/KodakEASYSHARE_Wireless.jpg
Kodak/KodakEasyShare-One.jpg
Kodak/KodakPictureKioskG4.jpg
Kodak/KodakProBack.jpg
Kodak/KodakProDCS14n.jpg
Kodak/KodakProSLRn.jpg
Leica/LeicaM8.2.jpg
Leica/LeicaM_Monochrom.jpg
Leica/LeicaR8-DigitalBackDMR.jpg
Leica/LeicaX_VARIO.jpg
Minolta/KonicaMinoltaDYNAX5D.jpg
Minolta/KonicaMinoltaDYNAX7D.jpg
Minolta/KonicaMinoltaDiMAGE_X21.jpg
Minolta/KonicaMinoltaDiMAGE_X60.jpg
Minolta/KonicaMinoltaRevioKD-420Z.jpg
Minolta/MinoltaDiMAGE7Hi.jpg
Motorola/MotorolaDROID2GLOBAL.jpg
Nikon/Nikon1V2.jpg
Nikon/NikonCOOLSCAN_VED.jpg
Nikon/NikonCoolpix800.jpg
Nikon/NikonCoolpixAW100.jpg
Nikon/NikonCoolpixP510.jpg
Nikon/NikonCoolpixP5100.jpg
Nikon/NikonCoolpixP520.jpg
Nikon/NikonCoolpixP6000.jpg
Nikon/NikonCoolpixS1200pj.jpg
Nikon/NikonD100.jpg
Nikon/NikonD2Xs.jpg
Nikon/NikonD3.jpg
Nikon/NikonD3000.jpg
Nikon/NikonD300S.jpg
Nikon/NikonD4.jpg
Nikon/NikonD4S.jpg
Nikon/NikonD5200.jpg
Nikon/NikonD5300.jpg
Nikon/NikonD70.jpg
Nikon/NikonD7000.jpg
Nikon/NikonD80.jpg
Nikon/NikonD800.jpg
Nintendo/Nintendo3DS.jpg
Nokia/NokiaLumia1020.jpg
Nokia/NokiaN9.jpg
Panasonic/PanasonicDMC-F3.jpg
Panasonic/PanasonicDMC-GH3.jpg
Panasonic/PanasonicDMC-LX7.jpg
Panasonic/PanasonicDMC-TZ22.jpg
Panasonic/PanasonicDMC-TZ60.jpg
Panasonic/PanasonicDMC-XS1.jpg
Pentacon/PentaconDCZ81.jpg
Pentax/PentaxK-01.jpg
Pentax/PentaxK-5.jpg
Pentax/PentaxK-5IIs.jpg
Pentax/PentaxK100D.jpg
Pentax/PentaxOptio330.jpg
Pentax/PentaxOptioWG-1GPS.jpg
Pentax/PentaxQ7.jpg
Pentax/Pentax_istDL.jpg
Polaroid/PolaroidPDC-2300.jpg
Reconyx/ReconyxPC900.jpg
Ricoh/RicohCaplio500SE.jpg
Ricoh/RicohG700SE.jpg
Ricoh/RicohGR.jpg
Ricoh/RicohGR_DIGITAL4.jpg
Ricoh/RicohRDC-5300.jpg
Ricoh/RicohRDC4300.jpg
Ricoh/RicohTHETA.jpg
Samsung/SamsungGT-i8910.jpg
Samsung/SamsungL73.jpg
Samsung/SamsungNX11.jpg
Samsung/SamsungNX30.jpg
Samsung/SamsungST50.jpg
Samsung/SamsungST65.jpg
Sanyo/SanyoCA65.jpg
Sanyo/SanyoDSC-MZ3.jpg
Sanyo/SanyoS650.jpg
Sigma/SigmaDP3Merrill.jpg
Sigma/SigmaSD14.jpg
Sony/SonyDCR-IP220.jpg
Sony/SonyDSC-RX100M2.jpg
Sony/SonyDSC-W370.jpg
Sony/SonyDSC-W510.jpg
Sony/SonyDSC-W650.jpg
Sony/SonyDSLR-A100.jpg
Sony/SonyDSLR-A580.jpg
Sony/SonyDSLR-A850.jpg
Sony/SonyILCE-7.jpg
Sony/SonySLT-A55V.jpg
Sony/SonySLT-A77V.jpg
Sony/SonySLT-A99V.jpg
SonyEricsson/SonyEricssonW595.jpg
Toshiba/ToshibaPDR-M60.jpg
UMAX/UMAX_MagicScan.jpg
Yakumo/Yakumo1210.jpg
_Other/003.JPG
_Other/009.JPG
_Other/016.JPG
_Other/036.jpg
_Other/070131-DG-0957.jpg
_Other/100_5414.JPG
_Other/106_8726.JPG
_Other/110624 578.jpeg
_Other/187.jpg
_Other/200808290066.jpg
_Other/20100602-P6028407.JPG
_Other/20111225-18388_date-should-be-xmas.jpg
_Other/395.jpg
_Other/Africa 53.jpg
_Other/Australia_02.JPG
_Other/Barcelona 007.JPG
_Other/Berlin_Sept_08_032.JPG
_Other/Chris-18ps.jpg
_Other/Copy of success - my7d
_Other/DSC_3895.jpg
_Other/DSC_6634.jpeg
_Other/HPIM0771.jpg
_Other/IMGP0423x.jpg
_Other/IMGP1215.JPG
_Other/IMG_0013.JPG
_Other/IMG_0102.JPG
_Other/IMG_0125.JPG
_Other/IMG_0290.jpg
_Other/IMG_0745.jpg
_Other/IMG_0926.jpg
_Other/IMG_1080.jpg
_Other/IMG_1413-.jpg
_Other/IMG_2705.JPG
_Other/IMG_3058.JPG
_Other/IMG_3428.JPG
_Other/IMG_6714.jpg
_Other/IMG_8373.jpg
_Other/Idaho 2007 002.jpg
_Other/L1010705.JPG
_Other/MLM 4500.jpg
_Other/P1000029.jpg
_Other/P1010002.JPG
_Other/P1010006.JPG
_Other/P1030876-rotated.JPG
_Other/P5270680.JPG
_Other/P9101543.JPG
_Other/R0015481.JPG
_Other/SSC Tennis Lee Tourn 148 009.JPG
_Other/WBOX_LOGO.JPG
_Other/XMP-sn in xmp element text.jpg
_Other/YN560C.JPG
_Other/_DSC0253.jpg
_Other/_DSC0457.jpg
_Other/_DSC9736.jpg
_Other/_IGP9748.JPG
_Other/am2.jpg
_Other/exif.jpg
_Other/images_entries_display_Traditional_Henna.jpg
_Other/sydney 019.JPG
(Will polish the script a bit before I prepare the pull request)
Update: total filesize of the 212 samples is 8856892 bytes or approx. 8.45 MB -> this would eliminate the need for a separate test subrepo
Excellent work, thanks for that. That is a massive improvement.
At the risk if complicating what you've already done, there is a slight tweak you could make to your algorithm that may reduce the number of images further (although it will be slower because it re-sorts on each iteration). Currently, imagine if you have 3 images, each with the following tags: 1: [A, B, C, D] 2: [C, D, E] 3: [E, F]
Your algorithm would add all 3 images, when really only image 1 and 3 are needed to represent the full set of tags. I would suggest the following:
We need to have a mechanism to allow adding new photos from new cameras in the future which should be straight forward with this script.
Cheers, Matt
Hi Matt,
thanks for your suggestion! I actually had the same thought - we should sort the images by "difference" to the current set of tags and pick the image that brings the most new tags to the set. After that the set is updated with the new tags and the difference sort starts again until no more images are left.
I will update my script and check the results.
Yeah, I'm not sure it'll make a massive difference. It's just my pedantry getting the better of me!
step 3 should really be something like:
This will support easy adding of photos at a later date.
Another word of warning though: currently the tag names used in the exiftool.js test reports are not vendor-prefixed. I discovered some tag clashes like the "LensType" tag which exists in Canon and Nikon Makernotes. Technically, these are two different tags, but currently they would count as one tag:
http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html#LensType http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Nikon.html, Tag 0x2000083
I will also introduce a namespace prefix to avoid such clashes.
Good point. Similarly, I flatten tags found in exif, or makernotes, or xmp etc. when occasionally there are clashes.
After incorporating the "maximum difference" algorithm and adding namespacing to the tags, I managed to reduce the sample image count from 212 to 176. Note that the number of unique detected tags has increased from 3695 to 4044 (I was using the brand-new ExifTool 10.0 to regenerate the JSON metadata reports with tag namespacing. In my former test I used the pre-generated reports from the repo which were created with an older ExifTool version. The old unique tag number might also be too low due to tag name clashing which is now prevented by namespacing).
Only downside is that my deduplication script takes now five times longer to run due to the frequent array diffing.
Here is the new output:
Acer/AcerCE6430.jpg.json
Agfa/Agfa_ePhotoCL30.jpg.json
Apple/Apple_iPhone5.jpg.json
Apple/Apple_iPhone5s.jpg.json
Camera/Camera8MP-9Q6.jpg.json
Canon/CanonEOS-1D.jpg.json
Canon/CanonEOS-1DS.jpg.json
Canon/CanonEOS-1D_MarkIII.jpg.json
Canon/CanonEOS-1D_MarkIV.jpg.json
Canon/CanonEOS-1D_X.jpg.json
Canon/CanonEOS-1DmkIIN.jpg.json
Canon/CanonEOS10D.jpg.json
Canon/CanonEOS40D.jpg.json
Canon/CanonEOS50D.jpg.json
Canon/CanonEOS5D.jpg.json
Canon/CanonEOS5D_MarkIII.jpg.json
Canon/CanonEOS60D.jpg.json
Canon/CanonEOS70D.jpg.json
Canon/CanonEOS7D.jpg.json
Canon/CanonEOS_D2000.jpg.json
Canon/CanonEOS_D30.jpg.json
Canon/CanonEOS_DIGITAL_REBEL.jpg.json
Canon/CanonEOS_KissDigitalX.jpg.json
Canon/CanonEOS_KissX5.jpg.json
Canon/CanonEOS_M.jpg.json
Canon/CanonEOS_REBEL_T3.jpg.json
Canon/CanonPowerShot600.jpg.json
Canon/CanonPowerShotA5.jpg.json
Casio/CasioEX-100.jpg.json
Casio/CasioEX-H20G.jpg.json
Casio/CasioEX-Z40.jpg.json
Casio/CasioEX-ZR15.jpg.json
Casio/CasioQV-4000.jpg.json
Casio/CasioQV-7000SX.jpg.json
DoCoMo/DoCoMoSH902i.jpg.json
Epson/EpsonCP-850Z.jpg.json
Epson/EpsonPerfection4990.jpg.json
FLIR/FLIR_B335.jpg.json
FLIR/FLIR_E30bx.jpg.json
FLIR/FLIR_E40.jpg.json
FLIR/FLIR_E60.jpg.json
FLIR/FLIR_P60NTSC.jpg.json
FLIR/FLIR_T640.jpg.json
FujiFilm/FujiFilmFinePixAX550.jpg.json
FujiFilm/FujiFilmFinePixF550EXR.jpg.json
FujiFilm/FujiFilmFinePixXP200.jpg.json
FujiFilm/FujiFilmSP-2000.jpg.json
FujiFilm/FujiFilmXQ1.jpg.json
GE/GE_A835.jpg.json
GE/GE_E1250TW.jpg.json
HP/HP_PhotoSmart210.jpg.json
HP/HP_PhotoSmartR817.jpg.json
HP/HP_iPAQ_VoiceMessenger.jpg.json
HP/HP_oj7300.jpg.json
KDDI/KDDI_W51P.jpg.json
Kodak/Kodak1640FilmScanner.jpg.json
Kodak/KodakCX6200.jpg.json
Kodak/KodakDC260.jpg.json
Kodak/KodakDC5000.jpg.json
Kodak/KodakEASYSHARE_Wireless.jpg.json
Kodak/KodakLS420.jpg.json
Kodak/KodakPictureKioskG4.jpg.json
Kodak/KodakProBack.jpg.json
Kodak/KodakProDCS14n.jpg.json
Kodak/KodakProSLRn.jpg.json
Leica/LeicaM8.2.jpg.json
Leica/LeicaM_Monochrom.jpg.json
Leica/LeicaR8-DigitalBackDMR.jpg.json
Leica/LeicaX1.jpg.json
Medion/MedionMD85830.jpg.json
Minolta/KonicaMinoltaDYNAX7D.jpg.json
Minolta/KonicaMinoltaDiMAGE_X60.jpg.json
Minolta/KonicaMinoltaMAXXUM5D.jpg.json
Minolta/MinoltaDiMAGE7Hi.jpg.json
Minolta/MinoltaDiMAGE_E201.jpg.json
Nikon/NikonCOOLSCAN_VED.jpg.json
Nikon/NikonCoolpix8800.jpg.json
Nikon/NikonCoolpix950.jpg.json
Nikon/NikonCoolpixP6000.jpg.json
Nikon/NikonCoolpixS4150.jpg.json
Nikon/NikonCoolpixS6000.jpg.json
Nikon/NikonCoolpixS9300.jpg.json
Nikon/NikonD2H.jpg.json
Nikon/NikonD300.jpg.json
Nikon/NikonD3000.jpg.json
Nikon/NikonD4.jpg.json
Nikon/NikonD4S.jpg.json
Nikon/NikonD5200.jpg.json
Nikon/NikonD5300.jpg.json
Nikon/NikonD7000.jpg.json
Nikon/NikonD80.jpg.json
Nikon/NikonD800E.jpg.json
Nintendo/Nintendo3DS.jpg.json
Nokia/NokiaLumia1020.jpg.json
Nokia/NokiaN9.jpg.json
Panasonic/PanasonicDMC-FT5.jpg.json
Panasonic/PanasonicDMC-G6.jpg.json
Panasonic/PanasonicDMC-TZ22.jpg.json
Panasonic/PanasonicPV-DV702.jpg.json
Pentax/PentaxK-01.jpg.json
Pentax/PentaxK-5.jpg.json
Pentax/PentaxK-x.jpg.json
Pentax/PentaxOptio430.jpg.json
Pentax/PentaxOptioWG-1GPS.jpg.json
Pentax/PentaxQ7.jpg.json
Pentax/Pentax_istD.jpg.json
Polaroid/PolaroidPDC-2300.jpg.json
Reconyx/ReconyxPC900.jpg.json
Ricoh/RicohCX5.jpg.json
Ricoh/RicohCaplio500SE.jpg.json
Ricoh/RicohG700SE.jpg.json
Ricoh/RicohGR.jpg.json
Ricoh/RicohRDC-4300.jpg.json
Ricoh/RicohRDC-5300.jpg.json
Ricoh/RicohTHETA.jpg.json
Samsung/SamsungAnycallSCH-W270.jpg.json
Samsung/SamsungGT-i8910.jpg.json
Samsung/SamsungL73.jpg.json
Samsung/SamsungNX30.jpg.json
Samsung/SamsungST50.jpg.json
Samsung/SamsungST65.jpg.json
Samsung/SamsungWB5000.jpg.json
Sanyo/SanyoCG65.jpg.json
Sanyo/SanyoDSC-MZ3.jpg.json
Sigma/SigmaDP1X.jpg.json
Sigma/SigmaDP2.jpg.json
Sigma/SigmaDP2Merrill.jpg.json
Sony/SonyDCR-IP220.jpg.json
Sony/SonyDSC-W370.jpg.json
Sony/SonyDSC-W650.jpg.json
Sony/SonyDSLR-A100.jpg.json
Sony/SonyDSLR-A560.jpg.json
Sony/SonyDSLR-A900.jpg.json
Sony/SonyILCE-7.jpg.json
Sony/SonyNEX-VG30E.jpg.json
Sony/SonySLT-A55.jpg.json
Sony/SonySLT-A77V.jpg.json
SonyEricsson/SonyC902.jpg.json
Toshiba/ToshibaPDR-M60.jpg.json
UMAX/UMAX_MagicScan.jpg.json
_Other/070131-DG-0957.jpg.json
_Other/100_5414.jpg.json
_Other/102_2740.jpg.json
_Other/106_8726.jpg.json
_Other/187.jpg.json
_Other/20100602-P6028407.jpg.json
_Other/20111225-18388_date-should-be-xmas.j
_Other/395.jpg.json
_Other/Australia_242.jpg.json
_Other/Chris-18ps.jpg.json
_Other/DSC_0052.jpg.json
_Other/DSC_3895.jpg.json
_Other/Flynns Corfu 003.jpg.json
_Other/HPIM0771.jpg.json
_Other/IMGP0423x.jpg.json
_Other/IMG_0290.jpg.json
_Other/IMG_0735.jpg.json
_Other/IMG_0745.jpg.json
_Other/IMG_3195.jpg.json
_Other/IMG_6414.jpg.json
_Other/L1010705.jpg.json
_Other/MLM 4500.jpg.json
_Other/P1010002.jpg.json
_Other/P1040692.jpg.json
_Other/P6290593.jpg.json
_Other/P9101543.jpg.json
_Other/SSC Tennis Lee Tourn 148 009.jpg.jso
_Other/WBOX_LOGO.jpg.json
_Other/XMP-sn in xmp element text.jpg.json
_Other/YN560C.jpg.json
_Other/_DSC0012.jpg.json
_Other/_DSC0253.jpg.json
_Other/_DSC0457.jpg.json
_Other/_IGP9748.jpg.json
_Other/exif.jpg.json
_Other/images_entries_display_Traditional_Henna.jpg.json
total number of included files: 176
total number of different tags: 4044
total filesize of sample images: 8777737
Excellent. That's an even better test suite reduction than I was expecting
Currently the exiftool.js coverage testsuite consists of about 7000 sample images from different camera models. This impressive number also brings some issues to the coverage suite:
IMHO, exiftool.js should switch to a tag-based coverage, where there is exactly one sample image per decodeable tag (e.g. Exif Make). Further sample images should only be added in case of known regressions (e.g. some Nikon model writing a wrong datetime tag).
Advantages:
Approach:
Please tell me if you are interested in a pull request.