insarlab / MiaplPy

MIAmi Phase Linking in PYthon
Other
92 stars 23 forks source link

Dask Issue during the Network Inversion #99

Open SMJMirza opened 2 weeks ago

SMJMirza commented 2 weeks ago

Hi,

I am using the ISCE-coregistered SLCs from the Sentinel-1 for the MiaplPy to explore the deformation of the Dam infrastructure. Everything runs well when I select the "single_reference" network of IFGs. However, I tried to select the "sequential" option of the IFGs network and got the attached error during the network inversion. Could you please kindly guide me on this issue? I am unsure if it is a problem with the MialPy or the Dask version since they can work well in other cases.

MiaplPy_Issue

ISCE version = 2.6.1 MiaplPy = latest version

Sincerely, Sayyed

codeautopilot[bot] commented 2 weeks ago

Potential solution

The issue seems to be related to the version of Dask being used in the environment. The specified version in both conda-env.yml and requirements.txt is dask>=1.0, which is outdated and may not support the functionalities required by the latest version of MiaplPy, especially for the "sequential" option in network inversion. Upgrading Dask to a more recent version should resolve compatibility issues and potentially fix the bug.

What is causing this bug?

The bug is likely caused by the outdated version of Dask specified in the environment configuration files. The "sequential" option in network inversion might rely on newer features or bug fixes in Dask that are not present in version 1.0. Additionally, there might be compatibility issues between the older Dask version and the latest MiaplPy functionalities.

Code

To address the issue, follow these steps:

  1. Upgrade Dask: Update the Dask version in both conda-env.yml and requirements.txt to a more recent version, such as dask>=2021.0.

    Update conda-env.yml:

    dependencies:
     - dask>=2021.0
     - dask-jobqueue>=0.7

    Update requirements.txt:

    dask>=2021.0
    dask-jobqueue>=0.7
  2. Test Compatibility: After upgrading, test the application to ensure that the "sequential" option in network inversion works correctly.

  3. Enhance Error Handling and Logging: Modify src/miaplpy/network_inversion.py to include better error handling and logging.

    Example code snippet:

    import logging
    
    logging.basicConfig(level=logging.DEBUG)
    
    try:
       logging.info("Starting network inversion with arguments: %s", iargs)
       ifgram_inversion_L1L2.main(iargs)
       logging.info("Network inversion completed successfully.")
    except Exception as e:
       logging.error("Error during network inversion: %s", str(e))
       raise

How to replicate the bug

  1. Set up an environment with the specified versions in conda-env.yml or requirements.txt, particularly dask>=1.0.
  2. Run the MiaplPy application with the "sequential" option for the IFGs network.
  3. Observe the error during the network inversion process, which should be similar to the one reported in the ticket.

By following these steps, you should be able to replicate the bug and verify that upgrading Dask resolves the issue.

Click here to create a Pull Request with the proposed solution

Files used for this task:

Changes on conda-env.yml Based on the `conda-env.yml` file, here are the observations and potential areas to investigate regarding the issue with the "sequential" option in network inversion: 1. **Dask Version**: The file specifies `dask>=1.0` and `dask-jobqueue>=0.3`. These versions are quite old, and there have been significant updates and improvements in Dask since version 1.0. It's possible that the issue arises from incompatibilities or missing features in these older versions of Dask when handling more complex operations like the "sequential" option in network inversion. 2. **Compatibility with MiaplPy**: Ensure that the Dask version used is compatible with the latest version of MiaplPy. If MiaplPy relies on newer features or bug fixes in Dask, upgrading Dask to a more recent version might resolve the issue. 3. **Conda Environment Configuration**: Verify that the environment is correctly set up and that there are no conflicts between package versions that could affect Dask's performance or functionality. ### Recommendations: - **Upgrade Dask**: Consider upgrading Dask to a more recent version, such as the latest stable release, to benefit from improvements and bug fixes. Ensure that other dependencies are compatible with the newer version of Dask. - **Test Compatibility**: After upgrading, test the "sequential" option in network inversion to see if the issue persists. - **Review Release Notes**: Check the release notes for Dask and Dask-Jobqueue to identify any changes or deprecations that might affect the current implementation. These steps should help determine if the Dask version is contributing to the issue and whether upgrading resolves the problem.
Changes on requirements.txt The `requirements.txt` file specifies the dependencies for the project, including the version constraints for each package. In this file, the Dask version is specified as `dask>=1.0` and `dask-jobqueue>=0.3`. Given the issue described, it is possible that the Dask version being used is not compatible with the latest functionalities of MiaplPy, especially for the 'sequential' option in network inversion. ### Analysis: 1. **Dask Version**: The specified version `dask>=1.0` is quite old, considering that Dask has undergone significant updates and improvements over the years. The latest versions of Dask might have breaking changes or improvements that are not compatible with older codebases or might require updates in the code to work correctly. 2. **Compatibility**: If MiaplPy or any of its dependencies require a newer version of Dask for certain functionalities, especially for parallel computing tasks like network inversion, using an outdated version could lead to errors. ### Recommendations: - **Update Dask**: Consider updating the Dask version to a more recent one, such as `dask>=2021.0` or later, to ensure compatibility with MiaplPy. However, ensure that other dependencies in the project are also compatible with the updated Dask version. - **Testing**: After updating, thoroughly test the application, especially the network inversion functionality with the 'sequential' option, to ensure that the issue is resolved and no new issues are introduced. - **Documentation**: Check the MiaplPy documentation or release notes for any specific version requirements or known issues related to Dask. This analysis focuses on the potential impact of the Dask version specified in the `requirements.txt` file. Further investigation into the `network_inversion.py` file might be necessary to ensure that the logic is compatible with the updated Dask version.
Changes on src/miaplpy/network_inversion.py Based on the provided source code from `src/miaplpy/network_inversion.py`, here is an analysis of potential issues related to the bug described in the user's issue: ### Analysis 1. **Sequential Option Handling**: - The code does not explicitly show how different network options like "single_reference" or "sequential" are handled. The logic for handling these options might be embedded in the `ifgram_inversion_L1L2.main(iargs)` function or in the configuration files (`smallbaselineApp.cfg` or `miaplpyApp.cfg`). 2. **Dask Compatibility**: - There is no direct indication of Dask being used in this script. If Dask is used within the `ifgram_inversion_L1L2` module or any other imported module, compatibility issues might arise there. The script itself does not provide any Dask-specific configurations or error handling. 3. **Error Handling**: - The script lacks error handling around critical operations such as file reading, directory changes, and function calls. This can make it difficult to pinpoint where exactly the error occurs during execution. 4. **Debugging Information**: - The script prints some debugging information, such as the command-line arguments and the current working directory, but it could benefit from more detailed logging, especially around the execution of `ifgram_inversion_L1L2.main(iargs)` and `generate_temporal_coherence.main(iargs)`. ### Recommendations 1. **Enhance Error Handling**: - Add try-except blocks around key operations to catch and log exceptions. This will help in identifying the exact point of failure. 2. **Improve Logging**: - Implement more detailed logging to capture the state of variables and the flow of execution, especially before and after calling external functions or modules. 3. **Investigate External Modules**: - Since the issue might be related to how the "sequential" option is processed, review the `ifgram_inversion_L1L2` and `generate_temporal_coherence` modules to ensure they handle different network options correctly and are compatible with Dask. 4. **Check Configuration Files**: - Ensure that the configuration files (`smallbaselineApp.cfg` and `miaplpyApp.cfg`) are correctly set up for the "sequential" option and do not contain any parameters that might conflict with Dask operations. 5. **Dask Version and Configuration**: - Although not directly related to this script, ensure that the Dask version and configuration in the environment are compatible with the MiaplPy functionalities, especially if Dask is used in other parts of the codebase. By addressing these areas, it should be possible to identify and resolve the issue related to the "sequential" option in the network inversion process.
mirzaees commented 2 weeks ago

Hi @SMJMirza How many workers have you set for the inversion? I have not seen this error before and might not be miaplpy issue. Looks like the inversion is continuing despite the error @yunjunz probably has a better understanding, I suspect you have set a high number of workers but the extent of your data is not so big. You may be able to run it successfully with fewer number of workers

yunjunz commented 2 weeks ago

I would suggest re-running without dask. Another weird thing is that the ref_phase is all zero, which means the reference pixel is at an invalid location.

SMJMirza commented 2 weeks ago

@mirzaees @yunjunz

Thank you for sharing your solutions. Sorry, I mentioned the additional information of setting in the following:

miaplpy.multiprocessing.numProcessor = 8 mintpy.compute.cluster = local mintpy.compute.numWorker = 8 mintpy.compute.maxMemory = 32

The dimension of the study area is 0.005deg * 0.012deg in latitude and longitude. Exactly, it is not too big and covers a Dam we are monitoring using this approach. Interestingly, I will not face any issues with this setting when I use the "single_reference" network type. However, I will see this issue during the "sequential" network type.

I will try to test it without Dask. In terms of "reference point", do you think it can be invalid when I did not face any error in the "reference_point" step, even in the "single_reference" network type?

BTW, I just want to test the "sequential" network type and see if we can get the data point over the Embankment section of the Dam since this structure can be suspected of failure. Fortunately, we already got the interesting point from the "single_reference" network.

SMJMirza commented 1 week ago

@yunjunz @mirzaees

I set the number of workers to 6 and ignored using the Desk to run the network of interferograms (sequential with 5 connections) for the new trying. In the attachment, you can see a screenshot of another error I got from the code during the "invert_network" step.

Screenshot from 2024-11-18 10-05-37

Do you have any suggestions about this issue? I can see that Sara used the sequential network of interferograms for her tests. So, I am not sure if there is any issue with this part of the package.

yunjunz commented 3 days ago

That's the real issue. @mirzaees may know better on this part.

mirzaees commented 3 days ago

@SMJMirza as I said I have not seen this error before but agreeing with Yunjun, the error says some values are incorrect. my next guess would be that some of the interferograms or all have not been created successfully for any reason, or the masks (conncomp mask or tempcoh mask) are zero. you mentioned the single reference was successful, so there should be some interruptions during your sequential run. I suggest checking your inputs for this step. unfortunately I can not help without further information and believe if the single reference was successful, there should not be any issue with sequential

SMJMirza commented 3 days ago

@mirzaees Thank you for your points. As I checked the terminal comments of steps 1 to 7, I did not get any clear errors or issues. However, I did not check the generated IFGs or mask files specifically. I will check and let you know if I fixed it.

SMJMirza commented 9 hours ago

@mirzaees

I checked the maskConnComp, avgPhaseVelocity, and avgSpatialCoh and found them non-zero. In addition, it seems that the interferograms look normal with artifacts of unwrapping errors in several ones, which can be eliminated later. So, I am not sure about the source of this error I am watching for the "sequential" network style. Do you have any other suggestions? Or can I share any figures or plots to have your next advice?