qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
18.32k stars 603 forks source link

Additional Info Columns for Similar/Duplicate Videos | Resolution, Codec, Bitrate #1263

Open C-BoT-AU opened 2 months ago

C-BoT-AU commented 2 months ago

Feature Description

("Resolution" such as 1080p/720p etc appear mostly as a custom entered field and not always available so I've found it easier to just pull height and width and interpret from there.

From what I can tell, Czkawka uses at least parts of FFMPEG. I believe what I mentioned here can be accomplished with it, however, I've found it easier with MediaInfo CLI after some solid trial and error and research getting the template right. I don't speak Rust, but do speak some Python and have a script that extracts the mentioned info using MediaInfo CLI to a CSV file for a directory. It isn't perfect and occasionally files return blank values in fields like BitRate when it shouldn't, but it does work so I'm happy to share if it helps at all.

Cut Down version of my Python Script:

def mediainfo_extract():
    report_name = get_report_name()
    list_of_files = get_list_of_files()

    # Clear the DataFrame before processing each directory
    df = pd.DataFrame(columns=['FilePath', 'FileName', 'FileSize', 'Duration', 'DurationString', 'FileExtension',
                            'Video-CodecID', 'Video-AspectRatio', 'Video-BitRate', 'Video-BitRateString', 'Video-FrameRate',
                            'Video-Height', 'Video-Width', 'Audio-Format', 'Audio-CodecID', 'Audio-BitRate'])

    for file in list_of_files:
        file_path = os.path.join(root, file)

        ### Run the MEDIAFINO COMMAND command using subprocess
        mediainfo_command = ["mediainfo", "--Inform=file:///app/template.txt", file_path]

        mediainfo_output = subprocess.check_output(mediainfo_command, text=True).strip()

        # Parse the mediainfo output and extract relevant information
        parsed_info = parse_mediainfo_output(mediainfo_output)

        parsed_info_df = pd.DataFrame.from_dict(parsed_info, orient='index').transpose()

        # Append the information to the DataFrame
        df = pd.concat([df, parsed_info_df], axis=0, ignore_index=True)

    # CSV File Path Declaration
    csv_file_name = f"{report_name}_report.csv"
    csv_path = os.path.join("/app/reports/", csv_file_name)

    # Check if the CSV file already exists
    if os.path.exists(csv_path):
        # If it does, rename and move to the archive folder
        date_created = datetime.now().strftime("%y%m%d-%H%M%S")
        new_csv_name = f"{date_created}_{csv_file_name}"
        new_csv_path = os.path.join("/app/archive/", new_csv_name)
        os.rename(csv_path, new_csv_path)

    # Write the DataFrame to a new CSV file
    df.to_csv(csv_path, index=False)

template.txt

General;FilePath: %CompleteName%\rDuration: %Duration%\rDurationString: %Duration/String%\r
Video;Video-CodecID: %CodecID%\rVideo-AspectRatio: %DisplayAspectRatio%\rVideo-BitRate: %BitRate%\rVideo-BitRateString: %BitRate/String%\rVideo-FrameRate: %FrameRate%\rVideo-Height: %Height%\rVideo-Width: %Width%\r\n
Audio;Audio-Format: %Format%\rAudio-CodecID: %CodecID%\rAudio-BitRate: %BitRate%\r\n

I can try to see if I can get the same results with FFPROBE if it is something you're interested in incorporating it (and you don't already have a better way). I don't speak rust but I'm happy to try to help in some way to show my thanks for the app!

boognish-rising commented 2 months ago

+1 for resolution