google / uvq

Apache License 2.0
91 stars 7 forks source link

Questions on UVQ #4

Open samujjwaldey opened 1 year ago

samujjwaldey commented 1 year ago

I have the following questions on UVQ.

  1. I am using the command "python3 uvq_main.py --input_files='Gaming_1080P-0ce6_orig,20,Gaming_1080P-0ce6_orig.mp4' --output_dir results --model_dir models" as mentioned at the link "https://github.com/google/uvq" to generate the file "Gaming_1080P-0ce6_orig_label_distortion.csv" at the path "results\Gaming_1080P-0ce6_orig\features". The file "Gaming_1080P-0ce6_orig_label_distortion.csv" contains 20 rows and 104 columns of distortion scores where each of the 20 rows correspond to each second of the nearly 20 second sample video "Gaming_1080P-0ce6_orig.mp4". Can someone please tell me how to correlate/map these distortion scores to the 25 different distortion types and their levels (each distortion type seems to have 5 levels) as mentioned at the following link: http://database.mmsp-kn.de/kadid-10k-database.html

  2. What is the range of the distortion scores in the cells of the file "Gaming_1080P-0ce6_orig_label_distortion.csv" ? Is 0 the minimum possible score (indicating minimum distortion) and 1 the maximum possible score (indicating maximum distortion) ?

  3. Is there a limit to the length of a video that can be analyzed by UVQ ?

  4. Is there a restriction on the codec or container format of a video that can be analyzed by UVQ or are videos belonging to all different types of codec and container formats supported ?

  5. There is a binary file related to distortion generated in the "results\Gaming_1080P-0ce6_orig\features" folder which is "Gaming_1080P-0ce6_orig_feature_distortion.binary" which "https://github.com/google/uvq" says contains "UVQ raw features (25600 float numbers per 1s chunk)". Is this "Gaming_1080P0ce6_orig_feature_distortion.binary" file just an intermediate output file that is generated by UVQ in the process of generating the final output file "Gaming_1080P-0ce6_orig_label_distortion.csv" or is there a way or need for the user to interpret the content inside this binary file perhaps by using some certain tool etc. ?

  6. I understand UVQ is designed to work well on user-generated content, where there is no pristine reference. Do you think UVQ will also work well on videos generated by Broadcasters / Service Providers etc. where there might be pristine references available ?

  7. Are there any Release Notes/Additional documentation on UVQ apart from the ones we have at the link "https://github.com/google/uvq" ?

yilinwang01 commented 1 year ago

Thanks for using UVQ!

Currently all public documents are listed on our GitHub homepage. We will update and release more in future.

Really appreciate users like you pointing out unclear parts of UVQ documentation and helping us improve this open source work.

Question 1 to 6 are answered below one by one.

yilinwang01 commented 1 year ago

For question 1 (format of the label distortion file):

The UVQ DistortionNet divides an input frame into 2x2 patches: 0 | 1 2 | 3

The numbers in each row follow this pattern, i.e. the first 26 elements are distortions for patch 0, then patch 1, 2, and 3.

The distortion scores for the entire frame can be estimated by averaging corresponding patch scores.

Within the 26 numbers for each patch, the first one is "unknown", which means probability of no distortion (or not belonging to the known 25 distortion types).

The other 25 elements follow the same order defined on http://database.mmsp-kn.de/kadid-10k-database.html (first is Gaussian blur and the last is Contrast change).

yilinwang01 commented 1 year ago

For question 2 (range of distortion scores):

The range for distortion labels is [0, 1], where 0 means no distortion, and 1 means strong distortion.

yilinwang01 commented 1 year ago

For question 3 (input length):

UVQ can handle videos in arbitrary lengths (bounded by how many extracted chunk features could be held in memory). However, the current model was calibrated on the YouTube UGC dataset, whose videos are in 20s. So if the input length is far from that (e.g. 1 hour), a common way is to split the video into chunks and then aggregate chunk scores.

yilinwang01 commented 1 year ago

For question 4 (codec and container):

The inputs of UVQ are decoded raw frames, so no restrictions on codec or containers.

yilinwang01 commented 1 year ago

For question 5 (binary file for raw features):

These binary files are raw features extracted from the last layer of UVQ subnetworks. The output labels (.csv) can be treated as high level interpretations of these raw features. No need to interpret raw features again.

These raw features are inputs to the AggregationNet for generating the final overall quality score. We provide these intermediate data to make it easy for users retraining UVQ AggregationNet on their domain specific data.

yilinwang01 commented 1 year ago

For question 6 (with pristine reference):

UVQ also works on pristine videos.

To measure relative quality changes between the reference and the target, just running UVQ on both videos and then get their score difference.

For such relative quality use cases, besides the default score ("compression_content_distortion"), we also suggest trying other scores like "compression_content". In some cases they are more sensitive to compression distortions than the default one.