mipops / dvrescue

Archivist-made software that supports data migration from DV tapes into digital files suitable for long-term preservation. Snapshot daily builds are at https://mediaarea.net/download/snapshots/binary/dvrescue/.
BSD 3-Clause "New" or "Revised" License
91 stars 20 forks source link

Suggestion: better support for working with / repairing DV files that have timecode incoherency / partially missing subcode data #929

Open JohnstonJ opened 1 month ago

JohnstonJ commented 1 month ago

Summary of the idea

Repair DV files that have missing subcode DIF data, so that they don't cause problems later on when merging, or when using with other tools. For example, merging these files results in both dropped frames and newly-duplicated frames.

Problematic input data: partial subcode DIFs

I'm capturing Digital8 tapes using a Sony DCR-TRV460 camcorder. The tapes were originally recorded on an older consumer Sony camcorder.

I'm noticing that the DV data has incomplete / erroneous subcode DIF data. For example:

image

Note that there are a fair number of what appear to be dropouts: it looks like identical subcode data is supposed to regularly repeated throughout the frame, but is often replaced with 0xFF bytes. Usually it's at the end of the DIF block, but could happen in the middle or beginning as well.

Curiously, there were no substantial/systematic errors in the video or audio data for the most part. The subcode data, by far, seems to be the most disproportionately affected. I don't know why this is, but the DV file is nevertheless damaged in this way.

How the incomplete subcode data will impact analysis

This issue with the subcodes becomes noticeable in the following ways:

How this breaks the Merge feature of DVRescue

Most concerningly, this timecode issue cascades into real problems when using the Merge feature in DVRescue. Remember, the vast majority of video data in my sample is intact. Unfortunately, DVRescue makes the problem worse in my situation. I suspect the merge feature is putting a lot of trust in the accuracy of the timecodes, and then mismatches the frames during the merge when the timecodes are incorrectly interpreted.

I have observed these problems when using Merge with these sorts of files:

The number of problems it incorrectly "fixed" due to these timecode issues greatly exceeded the actual number of video errors I had in one of my better capture passes. Therefore, using the merge feature simply didn't make sense.

Suggested fix: add a feature for totally rewriting subcodes

An idea/suggestion for dealing with this problem: what if DVRescue offered a way to systematically rewrite the timecodes in a single file? It doesn't take a genius to look at the DV Analyzer output and see what the correct timecodes are supposed to be, even if a frame or two gets interpreted with the wrong timecode. DVRescue could add a new feature / function that takes a single DV file as input, and writes a repaired DV file as output. Here's a rough idea of how I imagine it might be able to work:

  1. Each frame in the input file is carefully analyzed to obtain the correct timecode from the subcode DIFs. Since timecodes appear to be duplicated across many subcode DIFs within a frame, it should be possible to (a) merge all the 20 DIFs into a single pair of subcode DIFs, using a "most common byte" strategy or similar, (b) read that timecode from this merged result.
  2. It's possible that a particularly damaged frame might still not give us a reasonable timecode. In that case, we can examine the neighboring frames and extrapolate what the timecode was supposed to be. For example, if the timecodes say we have frames 13, 14, 15, 3, 17, 18, 19..... we can assume that frame 3 was supposed to be frame 16. (Some tuning parameters might be needed to avoid impacting real/actual discontinuities in the timecode, which should appear over larger numbers of frames.)
  3. The output DV code can then be written with subcode DIF structures that have been completely regenerated from scratch, with no dropouts whatsoever. This will ensure consistent behavior in all software that reads DV data and utilizes its timecodes, and eliminate all incoherencies seen by both DV Analyzer and DV Rescue. What you see in one tool will be what you get in any other tool!

This strategy could then be used with DVRescue to merge multiple files as follows:

  1. Each input file will first have its timecodes repaired using the above feature.
  2. The files can then be merged using the normal DVRescue merge feature.

Conceivably, this could simply be (optionally) done as a preprocessing step of the Merge feature itself. But keep in mind that it might still be useful as a standalone tool.

Example test case

I've attached the first couple hundred frames as a test case. Due to GitHub file attachment limits, it is a multi-part ZIP file:

  1. IncoherentTimecodes.zip.001.zip
  2. IncoherentTimecodes.zip.002.zip
  3. IncoherentTimecodes.zip.003.zip
  4. IncoherentTimecodes.zip.004.zip
  5. IncoherentTimecodes.zip.005.zip

To extract this:

  1. Download all the files.
  2. Remove the final .zip extension, i.e. IncoherentTimecodes.zip.003.zip --> IncoherentTimecodes.zip.003
  3. Use a tool like 7-Zip to extract the ZIP archive as normal.

The same 200 frames were captured 5 different times. All of them show several analysis errors in DVRescue, except for the fourth pass, which inexplicably does not, apparently due to a lucky capture attempt. All of them will show analysis errors in DV Analyzer.

Next, try merging them. Use Johnston3-pass1.dv as the initial file to merge.

Here is a screenshot of the merge, showing several "fixed" frames: image

Unfortunately, it created a new duplicate frame, where none existed before: image

Note that the sequence number is now showing some errors, where it did not previously do so. Most unfortunately, frame 64 is now a total duplicate of frame 65, whereas frame 64 contained unique video data in all the input files. The true video data for frame 64 has now been lost.

dericed commented 3 weeks ago

Hi @JohnstonJ, there's a lot to comment on here and I really appreciate your report. A dvfixer would be super helpful, but we didn't have time in our project to work on something like that. Still we're hoping to find a way to continue the project.

As far as dvrescue vs dvanalyzer, the development teams were the same, but the tools were independently developed. The dvrescue approach to analysis is a completely new start rather than a build upon dvanalyzer. There's a number of heuristics that would lead to the tools acting differently. For instance, I think dvanalyzer gives each frame a timecode based on the first found timecode, whereas dvrescue reads the timecodes from all dv dif sequences and goes with the most commonly occurring timecode within the frame.

Pinging @JeromeMartinez to take a look at the samples attached.