Please describe the feature you have in mind and explain what the current shortcomings are?
Issue Description:
The current implementation of the file pinning process in our codebase uses a recursive approach. While this method works effectively for smaller structures, it can potentially hit Python's recursion limit when dealing with larger structures. This not only risks stack overflow errors but also may not be the most performant solution for handling large-scale operations.
To address these concerns, I propose researching alternative methods that can:
Avoid recursion limits: Shift from a recursive implementation to an iterative one to handle large structures more robustly.
Improve performance: Explore the use of multithreading to process multiple files simultaneously, potentially reducing the overall processing time.
Action Points for Research:
Recursive to Iterative Approach: Investigate and propose an iterative method to replace the current recursive approach for file pinning. Analyze the potential challenges and benefits of this shift.
Multithreading Implementation: Explore the feasibility of using multithreading to handle multiple files concurrently. Consider the implications of threading, such as data integrity, thread safety, and the Global Interpreter Lock (GIL) in Python.
Performance Metrics: Benchmark the current recursive implementation and compare it with the proposed iterative and multithreaded approach to quantify performance improvements.
Edge Cases and Limitations: Identify any edge cases or limitations that might arise from changing the current implementation to a multithreaded approach.
Recommendations: Based on the research, provide recommendations on the best path forward, including any necessary changes to the codebase and potential impacts on existing functionality.
Additional Information:
Current recursion limit issues have been observed in [specific modules/functions], particularly when dealing with structures exceeding [specific size].
Consideration should be given to maintaining backward compatibility with existing functionality.
This research will guide us in refactoring the file pinning process to be more robust and performant, ensuring scalability as the codebase grows.
How would you imagine the implementation of the feature?
Implementation Suggestion: Multithreading with a Tree/Graph Structure
To implement the solution, we could use a tree or graph structure to represent the data. Here's how the approach would work:
Tree/Graph Structure:
Each Sdf.Layer would represent an individual node in the tree/graph structure.
Nodes would be connected to their parent nodes, forming a hierarchy that mirrors the structure of the files.
Every Parent node needs to have an pointer to the child nodes for better traversal
Multithreading:
Once the structure is established, each node (representing a traversal element or layer) could be processed on a separate thread.
This approach would allow for concurrent processing of multiple nodes, improving performance and scalability.
Main Thread Coordination:
It would distribute tasks to worker threads, monitor their progress, and handle any exceptions or errors that arise during processing.
This approach ensures that the main thread remains available to oversee the operation and handle any issues that may occur without being bogged down by the actual file processing.
This method would allow us to avoid recursion limits by eliminating deep recursion and leveraging multithreading to parallelize the workload, resulting in improved performance, particularly with large data structures.
Are there any labels you wish to add?
[X] I have added the relevant labels to the enhancement request.
Is there an existing issue for this?
Please describe the feature you have in mind and explain what the current shortcomings are?
Issue Description:
The current implementation of the file pinning process in our codebase uses a recursive approach. While this method works effectively for smaller structures, it can potentially hit Python's recursion limit when dealing with larger structures. This not only risks stack overflow errors but also may not be the most performant solution for handling large-scale operations.
To address these concerns, I propose researching alternative methods that can:
Action Points for Research:
Recursive to Iterative Approach: Investigate and propose an iterative method to replace the current recursive approach for file pinning. Analyze the potential challenges and benefits of this shift.
Multithreading Implementation: Explore the feasibility of using multithreading to handle multiple files concurrently. Consider the implications of threading, such as data integrity, thread safety, and the Global Interpreter Lock (GIL) in Python.
Performance Metrics: Benchmark the current recursive implementation and compare it with the proposed iterative and multithreaded approach to quantify performance improvements.
Edge Cases and Limitations: Identify any edge cases or limitations that might arise from changing the current implementation to a multithreaded approach.
Recommendations: Based on the research, provide recommendations on the best path forward, including any necessary changes to the codebase and potential impacts on existing functionality.
Additional Information:
This research will guide us in refactoring the file pinning process to be more robust and performant, ensuring scalability as the codebase grows.
How would you imagine the implementation of the feature?
Implementation Suggestion: Multithreading with a Tree/Graph Structure
To implement the solution, we could use a tree or graph structure to represent the data. Here's how the approach would work:
Tree/Graph Structure:
Sdf.Layer
would represent an individual node in the tree/graph structure.Multithreading:
Main Thread Coordination:
This method would allow us to avoid recursion limits by eliminating deep recursion and leveraging multithreading to parallelize the workload, resulting in improved performance, particularly with large data structures.
Are there any labels you wish to add?
Describe alternatives you've considered:
No response
Additional context:
No response