hasadna / knesset-data-pipelines

Main repository for Open Knesset project - contains the knesset data scrapers and processing pipelines
https://oknesset.org/
MIT License
14 stars 26 forks source link

fix failing pipelines #205

Open OriHoch opened 1 year ago

OriHoch commented 1 year ago

there are a couple of pipelines which fail and dependencies which don't run due to that, can be seen in the pipelines dashboard under "Failed Execution" and "Can't start" tabs - https://production.oknesset.org/pipelines/

yanirmr commented 1 year ago

Hi @OriHoch , I hope this message finds you well :) I would like to help address this issue by providing some useful questions and suggestions for breaking the task down into manageable subtasks. Understanding the specifics of the problems we're facing will allow us to develop targeted solutions and improve the pipeline's overall performance.

Questions:

  1. Error Messages: Could you provide more information about the different error messages for the failed executions and "Can't Start" pipelines? This will help us identify the root causes of the issues and propose appropriate solutions.
  2. Pipeline Dependencies: Are there any known dependencies between the pipelines that are currently failing or unable to start? Understanding these dependencies can help us prioritize the order in which we address the issues.
  3. Troubleshooting Attempts: Have there been any previous attempts to resolve these issues? If so, could you share the findings and any relevant documentation? This information will help us avoid repeating the same steps and focus on new approaches.
  4. Environment and Configuration: Are there any specific environment settings or configurations that we should be aware of when working on these pipelines? This will help ensure that we properly replicate the conditions under which the issues occur.
  5. Prioritization: Which of these failing pipelines are most critical to the project? Prioritizing the order in which we tackle the issues will help us focus our efforts on the most important and impactful problems first.

Suggested Breakdown of Subtasks:

Please let me know if these questions and subtasks align with your expectations for addressing the issue at hand, or if you have any additional feedback or concerns. I hope this approach will help all contributors to focus their efforts more effectively and collaboratively work towards improving the overall quality and performance of the project.

OriHoch commented 1 year ago
  1. Error Messages: Could you provide more information about the different error messages for the failed executions and "Can't Start" pipelines? This will help us identify the root causes of the issues and propose appropriate solutions.

The error pipelines show all the logs which should have all the required details to debug, they include the full stack trace allowing to see exactly the line of code causing the failure. I believe that the errors will also reproduce locally. The can't start pipelines are pipelines that depend on the failing pipelines, this is why they can't start, the status in the dashboard shows exactly which pipeline dependency is blocking.

  1. Pipeline Dependencies: Are there any known dependencies between the pipelines that are currently failing or unable to start? Understanding these dependencies can help us prioritize the order in which we address the issues.

The dependencies are defined in the pipeline yamls, for example this yaml shows the dependencies of the people/attendance/committee-meetings which is currently in can't start status.

  1. Troubleshooting Attempts: Have there been any previous attempts to resolve these issues? If so, could you share the findings and any relevant documentation? This information will help us avoid repeating the same steps and focus on new approaches.

There weren't any attempts

  1. Environment and Configuration: Are there any specific environment settings or configurations that we should be aware of when working on these pipelines? This will help ensure that we properly replicate the conditions under which the issues occur.

How to run the pipelines should be explained in the README, but some details might be out of date, so if someone intends to start working on it, contact me and I will help them set it up

  1. Prioritization: Which of these failing pipelines are most critical to the project? Prioritizing the order in which we tackle the issues will help us focus our efforts on the most important and impactful problems first.

I don't have any prioritization

Suggested Breakdown of Subtasks

Feel free to open issues for subtasks

OriHoch commented 1 year ago

assigning the issue to you, not necesarily to implement it, but I think you are a good person to centralize the efforts for this