experiment with object regognition trained neural net and opencv video frame extraction to match videos by content

makingglitches / GooglePhotoDownload

Connects to Google Photos and downloads all content, keeping track of original data on disk and moving files only on the computer and already on server to respective directories, and tries to download the entire collection and store size info for quicker startup as well as original file sizes of those on disk, downloading the files still on disk first so they can be freed up. Supports multiple user accounts. Its just a better mousetrap. Google Takeout prepares whole archives of photos, this allows you to download them separately and keep track of some statistics as well on space usage.

6 stars 4 forks source link

experiment with object regognition trained neural net and opencv video frame extraction to match videos by content #45

Open makingglitches opened 2 years ago

makingglitches commented 2 years ago

create a method utilizing object recognition neural nets like mobilenet to match even incorrect detection given uniform frame size and see how close the results are between videos that have been recoded by good or reduced for archival against originals.

match coordinate returns and object class by frame.

makingglitches commented 2 years ago

in videos this would likely work very very well, given the precision expectations of 'sameness' where a pixel off (adjusting for scale) would be wayyyyyyyy wrong. even if there will be a little fuzzy calculation between differing resolution sizes.... and aspectratios need to be stored in the metadata.

this is kind of a fuzzy logic idea, as there would be room for failure and false postitives/negatives however they'd be less likely. also there would only be qualitative categories of objects, even misrecognized objects, which mobilenet for example would give the position of....

i suppose if you didn't ADD to stretched or distorted image, which is not what this is for anyway, the resize to the standard tilesize of 300x300 would be corrected... sorta..

if there is SIGNIFICANT pixel distortion this won't work.

makingglitches commented 2 years ago

seem to remember this didn't yield the best results, a confidence index couldn't be built that was reliable enough as distorting test data by changing color space, etc greatly altered the confidence indexes even after even resize restoring aspect ratio equality. therefore also I can't remember if the Single Shot Detection neural net even got the same coordinates with index detection and classification, so a little more consideration.

the YolorV4 detection neural network was previously slow when they released it during this time period so that my hardware needs upgraded to be able to process anything at a speed that would allow me to run some tests.

backburner this idea, and let the tech catch up. or generate some data to see if a comparison can be derived from relatively similar positions.

god i hate these fucking people.

makingglitches commented 2 years ago

still relevant however noting hardware requirements are HIGH and accuracy is not imho as high as reported for recognition.