Open makingglitches opened 2 years ago
in videos this would likely work very very well, given the precision expectations of 'sameness' where a pixel off (adjusting for scale) would be wayyyyyyyy wrong. even if there will be a little fuzzy calculation between differing resolution sizes.... and aspectratios need to be stored in the metadata.
this is kind of a fuzzy logic idea, as there would be room for failure and false postitives/negatives however they'd be less likely. also there would only be qualitative categories of objects, even misrecognized objects, which mobilenet for example would give the position of....
i suppose if you didn't ADD to stretched or distorted image, which is not what this is for anyway, the resize to the standard tilesize of 300x300 would be corrected... sorta..
if there is SIGNIFICANT pixel distortion this won't work.
seem to remember this didn't yield the best results, a confidence index couldn't be built that was reliable enough as distorting test data by changing color space, etc greatly altered the confidence indexes even after even resize restoring aspect ratio equality. therefore also I can't remember if the Single Shot Detection neural net even got the same coordinates with index detection and classification, so a little more consideration.
the YolorV4 detection neural network was previously slow when they released it during this time period so that my hardware needs upgraded to be able to process anything at a speed that would allow me to run some tests.
backburner this idea, and let the tech catch up. or generate some data to see if a comparison can be derived from relatively similar positions.
god i hate these fucking people.
still relevant however noting hardware requirements are HIGH and accuracy is not imho as high as reported for recognition.
create a method utilizing object recognition neural nets like mobilenet to match even incorrect detection given uniform frame size and see how close the results are between videos that have been recoded by good or reduced for archival against originals.
match coordinate returns and object class by frame.