Various code cleanups (ChatGPT4 analysis included)

happycube commented 1 year ago

I just got ChatGPT 4 with browsing, and I decided to point it at core.py on github:

After reviewing the code, here are some suggestions for improving the core.py file:

Refactoring: The code contains some complex and long functions, such as buildmetadata(self, f, check_phase=True). These could be refactored into smaller functions to make the code more readable and maintainable. For instance, the code block that computes the disk_TimeCode and disk_Frame could be extracted into its own function.

Error Handling: There are numerous places where exceptions are simply printed and then execution continues. It would be better to handle exceptions in a way that doesn't allow the program to continue in an erroneous state. For example, if an exception is thrown when computing fi["clvMinutes"], fi["clvSeconds"], and fi["clvFrameNr"], the code simply logs a warning and continues execution. This could potentially lead to downstream errors or inconsistent program state.

Logging: The use of both print() and logger for outputting information is inconsistent. It's generally recommended to use logging over print statements, as it provides more flexibility in terms of output format, level of detail, and destination. Consider replacing all print() statements with appropriate logging calls.

Code Comments: While there are quite a few comments in the code, certain parts could use more explanation. For example, the buildmetadata function has some intricate logic, and additional comments would help clarify what's happening. Moreover, docstrings for the functions are often not detailed enough. It would be beneficial to provide more information about the purpose of the function, its parameters, and return values.

Magic Numbers: The code contains magic numbers, such as 10 in fi["syncConf"] = 10 and -1.4 in badj = -1.4. These could be replaced with named constants to make the code more readable and easier to maintain.

Type Hints: Python supports type hints, which can make the code more understandable and easier to debug. Consider adding type hints to function signatures.

Naming Conventions: Python recommends using snake_case for function and variable names. Some variables use camelCase, such as disk_TimeCode and disk_Frame. It's better to adhere to one convention throughout the codebase.

Code Duplication: The code contains some duplicated logic, such as checking self.isCLV multiple times in different places in the buildmetadata function. This could be refactored to avoid duplication.

Unused Code: Some lines of code seem to be commented out or unused, such as # logger.info('WARNING! isFirstField stuck between fields'). It would be best to remove this to keep the codebase clean and readable.

Please note that these are general suggestions and the specific improvements might depend on the context and requirements of your project1234.

oyvindln commented 1 year ago

Naming is a bit all over the place so would be nice to clean up a bit of that.

happycube commented 1 year ago

Prompt: Create a list of possible name changes for better consistency and PEP-8 compliance in https://raw.githubusercontent.com/happycube/ld-decode/master/lddecode/core.py

Based on my analysis of the code from the provided link, here are some recommendations for variable and function names that could be updated for better consistency and PEP-8 compliance:

Variable names:

prevfi -> previous_field_info fi -> field_info f -> field decodeFaults -> decode_faults disk_Type -> disk_type disk_TimeCode -> disk_time_code disk_Frame -> disk_frame est_frames -> estimated_frames outstr -> output_string fnr -> frame_number curfield -> current_field jout -> json_output vp -> video_parameters

Function names:

buildmetadata -> build_metadata seek_getframenr -> seek_get_frame_number build_json -> build_json_output

Pretty impressive, but it looks like the file exceeds it's token memory (GPT4 has some high-token models via API, but they ain't cheap and I don't think I even have access to them yet)

happycube commented 4 months ago

Trying out Claude Opus for a bit:

Here are some suggestions for improving the code in the attached Python file:

Use more descriptive variable names instead of single letters like f, l, etc. More meaningful names will make the code easier to understand.
Add more detailed comments explaining what the various functions and code blocks are doing. While there are some comments, additional explanation of the algorithms and calculations would be beneficial.
Break up some of the larger functions into smaller, more focused functions. Functions like process(), computeMetrics(), buildmetadata() are quite long. Extracting logical pieces into separate functions would improve readability.
Use constants instead of magic numbers. For example, instead of checking if mtf_level > 0.05, define a constant like MTF_LEVEL_THRESHOLD = 0.05.
Consider using more descriptive names for the member variables in the Field classes. Names like linelocs1, linelocs2, linelocs3 don't convey much meaning.
Use f-strings for string formatting instead of the older % style formatting. F-strings are more readable.
In the seek() function, the for loop variable "retries" is not actually used. Consider removing it or using it to limit the number of retries.
The writeout() function has a lot of if statements checking if various file handles are None before writing. Consider extracting those checks into separate functions to make writeout() more readable.
Use type hints to specify expected argument and return value types for functions. This makes the interfaces clearer.
Some of the functions like downscale_audio() are defined outside of any class. Consider moving them into an appropriate class as static methods if it makes sense.
The profile decorator is used inconsistently - either use it on all major functions or remove it entirely.
Use pathlib instead of os.path for file path handling. Pathlib provides a more modern and Pythonic API.
In the computeMetrics() functions, the unused "verbose" parameter can be removed.
Consider adding a repr or str method to the Field classes to make them easier to print and debug.

Overall the code looks quite complex as it's dealing with a specialized domain (Laserdisc decoding). More extensive comments explaining the overarching algorithm and key calculations would make it easier for other developers to understand and maintain. Let me know if you have any other questions!

To make the code more PEP-8 compliant, here are some suggested renamings:

Class names should use CamelCase. Rename classes like "efm_pll" to "EfmPll".
Function and variable names should use snake_case. For example:
- Rename "getLine0()" to "get_line0()"
- Rename "isFirstField" to "is_first_field"
- Rename "outfile_video" to "outfile_video"
- Rename "linelocs1", "linelocs2" etc. to more descriptive names like "linelocs_raw", "linelocs_refined" etc.
Constants should be in UPPER_CASE. For example:
- Rename "iretohz" to "IRE_TO_HZ"
- Rename "clvfps" to "CLV_FPS"
Avoid single-letter variable names. For example:
- Rename "f" to "field"
- Rename "l" to "line"
- Rename "i" to "index" or a more descriptive name based on context
Rename module-level functions to have a prefix related to the module name. For example, rename "downscale_audio" to "audio_downscale".
Avoid abbreviations in names unless they are very well known. For example:
- Rename "tbc" to "time_base_correction"
- Rename "efm" to "eight_to_fourteen_modulation"
Use a single underscore prefix for "internal" functions and variables that are not part of the public API of a class. For example, in the Field class, prefix internal variables like "rawpulses" with an underscore: "_rawpulses".
Rename functions that compute a value to have a "compute_" prefix. For example, rename "getpulses()" to "compute_pulses()".
Be consistent with verb tense in function names. For example, rename "computefilters()" to "compute_filters()" to match the style of other function names.
Avoid camelCase in function and variable names. For example, rename "inputfreq" to "input_freq".

Remember, the goal of PEP 8 is to improve the readability and consistency of Python code. The renaming suggestions above are based on the general conventions outlined in PEP 8, but always use your judgment to choose names that make the code more understandable in your specific context.

happycube commented 3 months ago

Several functions in this code are quite complex and could benefit from more detailed comments or refactoring to improve readability. Here are a few that stand out:

decodefield (Line 1918): This function is quite long and contains a lot of complex logic for decoding a single field of video. It could be broken down into smaller, more focused functions with clear comments explaining each step.
computefilters (Line 1128): This function computes several filters used in the decoding process. The purpose of each filter and the rationale behind the chosen parameters could be explained more clearly.
getLine0 (Line 872): This function attempts to find the start of the first line of video in a field. The algorithm is quite complex and could benefit from a high-level explanation of the approach.
downscale (Line 1459): This function downscales the decoded video to the output resolution. The details of the scaling algorithm and the handling of different channels (luma, chroma, audio) could be explained more clearly.
demodblock (Line 1033): This function performs the core demodulation of the raw RF data. A high-level explanation of the demodulation process and the role of each sub-step would make this easier to understand.
computemetrics (Line 2594): This function computes various quality metrics for the decoded video. The meaning and significance of each metric could be explained more clearly.
buildmetadata (Line 2680): This function extracts various metadata from the decoded video, including information encoded in the vertical blanking interval. More comments explaining the structure and meaning of the encoded data would be helpful.
readfield (Line 1980): This function reads and decodes a single field of video. It coordinates several complex steps and could benefit from a high-level explanation of the overall process.

[two hallucinated functions were here]

These are just a few examples, but in general, any complex algorithmic code or code that relies on domain-specific knowledge (e.g., details of the NTSC/PAL video formats) would benefit from more extensive comments. The goal should be to make the code understandable to a competent programmer who may not be an expert in the specific domain of video decoding.

happycube commented 3 months ago

And here's how gemini pro 1.5 replied to "How can this code be improved?":

Potential improvements for the LDdecode code:

This code seems well-structured and utilizes various optimization techniques like numba and profiling. However, there's always room for improvement. Here are some potential areas to consider:

Code clarity and maintainability: Reduce code duplication: There seems to be some code duplication, especially in the computeMetricsPAL and computeMetricsNTSC functions. Consider refactoring to share common logic and reduce redundancy. Improve variable naming: Some variable names could be more descriptive. For example, lld and lld2 could be renamed to something like line_length_diffs and line_length_diff_derivatives. Add more comments: While some parts of the code are well-commented, others could benefit from additional explanation, especially for complex algorithms and calculations.

Performance optimization: Further leverage numba: While the code already uses numba for several functions, there might be additional opportunities to apply it for performance-critical sections. Explore alternative data structures: Some data structures, like the lists used in findpulses_numba_raw, might benefit from being replaced with more efficient alternatives like numpy arrays.

Investigate vectorization: Some loops and calculations might be vectorized using numpy functions for improved performance.

Functionality and robustness: Handle edge cases and errors more gracefully: Some error handling could be improved, like the handling of non-monotonic linelocs in get_linelen. Consider adding more robust error checking and recovery mechanisms. Extend functionality: The code currently focuses on decoding PAL and NTSC laserdiscs. It could be extended to support other formats or features, such as handling different audio sample rates or decoding additional VBI data. Improve testing: Adding more comprehensive unit tests and integration tests can help ensure the code's correctness and robustness as it evolves.

General suggestions: Consider using a linter: A linter can help identify potential code quality issues and enforce coding style consistency. Use a version control system: Using a version control system like Git can help track changes, collaborate with others, and revert to previous versions if needed. Document the code: Consider creating documentation that explains the code's design, functionality, and usage. This can be helpful for both yourself and other developers.

Remember that optimization is often a trade-off between performance, clarity, and maintainability. Choose the improvements that best suit your needs and priorities.

Me: What calculations currently using scalar Python code can be optimized using numpy and/or numba?

Model: Several calculations in the provided code currently use scalar Python code and could potentially be optimized using numpy and/or numba. Here are some examples:

computewow function: This function iterates over a list of line lengths and calculates the wow factor for each line. This loop can be easily vectorized using numpy:

def computewow(self, lineinfo):
wow = np.ones(len(lineinfo))
wow[:-1] = np.diff(lineinfo) / self.inlinelen
# smooth out wow in the sync area
wow[self.lineoffset:self.lineoffset + 10] = np.median(wow[self.lineoffset:self.lineoffset + 4], axis=0)
return wow

Use code with caution.

downscale function: This function iterates over lines and performs scaling and calculations for each line. The loop can be partially vectorized using numpy, although some parts might require careful handling due to dependencies between lines.
dropout_detect_demod function: This function performs various comparisons and logical operations on arrays. Many of these operations can be efficiently implemented using numba, which can significantly improve performance.
build_errlist function: This function iterates over an error map and builds a list of error ranges. While the logic might be difficult to fully vectorize, using numba for the loop and calculations can still provide performance benefits.
compute_line_bursts function: This function performs calculations and calls clb_findbursts within a loop. While clb_findbursts is already optimized with numba, the surrounding loop and calculations could be further optimized using numpy or numba.

General advice: Look for loops that iterate over arrays and perform element-wise operations. These are often good candidates for vectorization with numpy. Identify computationally intensive sections of code that involve numerical calculations. These can often be optimized using numba. Use profiling tools to identify performance bottlenecks and focus your optimization efforts on the most critical areas.

Remember that optimization is not always straightforward, and sometimes the benefits might be marginal. Consider the trade-off between performance, code clarity, and maintainability when deciding which optimizations to implement.

Me: What comments should I add? Provide specific examples please.

Model: Adding comments to your code can significantly improve its clarity and maintainability, especially for complex algorithms and calculations. Here are some specific examples of comments you could add to the LDdecode code:

Explaining complex calculations:

In the compute_line_bursts function, the calculation of zcburstdiv could be clarified with a comment:

zcburstdiv = (lfreq * fsc_mhz_inv) / 2 # Convert burst zero crossings to color cycles

Use code with caution.

Describing the purpose of functions and variables: The refine_linelocs_hsync function could benefit from a comment explaining its goal: def refine_linelocs_hsync(self):

"""Refine line locations based on the precise timing of HSYNC pulses.""" ...

Highlighting important assumptions or limitations: In the downscale_audio function, a comment could be added about the supported output format:

def downscale_audio(audio, lineinfo, rf, linecount, timeoffset=0, freq=44100, rv=None):
"""Downscale audio to 16-bit, 44.1 kHz output format."""
...

Providing context for specific code sections:

In the getLine0 function, a comment could explain the logic behind choosing the best line 0 location:

# Best case - all three line detectors returned something - perform TOOT using median
    if line0loc_local is not None and line0loc_next is not None and line0loc_prev is not None:
        ...

Referencing external resources: In the computeefmfilter function, a comment could link to the paper that inspired the filter design: """Frequency-domain equalisation filter for the LaserDisc EFM signal. Inspired by the input signal equaliser in WSJT-X, described in: http://physics.princeton.edu/pulsar/k1jt/MSK144_Protocol_QEX.pdf """ ...

These are just a few examples, and the specific comments you add will depend on the complexity of the code and your personal preferences. However, focusing on explaining non-obvious logic, clarifying the purpose of functions and variables, and highlighting assumptions or limitations can significantly improve the code's readability and understanding.

Remember that good comments should be concise, informative, and up-to-date. Avoid adding redundant comments that simply restate what the code is doing.

happycube / ld-decode

Various code cleanups (ChatGPT4 analysis included) #856

I just got ChatGPT 4 with browsing, and I decided to point it at core.py on github: