This pull request adds support for polling a paginated API based on a page number in the URL's query parameters. As the amount of data being fetched can be very large in my use case, the PR has support for using multiple threads and saving the current page to a file.
Pages can be fetched concurrently on multiple threads at the same time. The page in progress can be saved to a file to restore progress in case of Logstash being stopped or crashing. A file path can be specified, or else the file will be created in the Logstash data directory. It is possible to choose whether to delete the file when the job finishes.
If requests start failing at some point while querying all pages, they can be retried. Other possible failure modes are stopping the input and continuing on error. Success status codes can be specified, so that only certain status code responses are counted as successes.
This pull request adds support for polling a paginated API based on a page number in the URL's query parameters. As the amount of data being fetched can be very large in my use case, the PR has support for using multiple threads and saving the current page to a file.
An example config can be seen here:
This will send requests to http://localhost:8000/example?page=1 http://localhost:8000/example?page=2 and so on.
Pages can be fetched concurrently on multiple threads at the same time. The page in progress can be saved to a file to restore progress in case of Logstash being stopped or crashing. A file path can be specified, or else the file will be created in the Logstash data directory. It is possible to choose whether to delete the file when the job finishes.
If requests start failing at some point while querying all pages, they can be retried. Other possible failure modes are stopping the input and continuing on error. Success status codes can be specified, so that only certain status code responses are counted as successes.