openaddresses / batch

OpenAddresses/Machine based AWS Batch based ETL Processing
https://batch.openaddresses.io/
MIT License
6 stars 5 forks source link

Individual sources not updating #369

Closed arch0345 closed 7 months ago

arch0345 commented 1 year ago

Describe the bug Some sources that have been added in the past year or so aren't showing up in the Individual Sources section, sources that have been updated aren't showing the latest version, and sources that have been deleted are still showing up.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://batch.openaddresses.io/data
  2. Click on 'us/fl/sarasota' under Individual sources

The download shown is from 11/11/2022, however this source was updated back in June in https://github.com/openaddresses/openaddresses/pull/6511

Also 'us/fl/statewide2' is still shown despite it being removed in https://github.com/openaddresses/openaddresses/pull/6554 (https://github.com/openaddresses/openaddresses/issues/6974) and 'us/fl/suwannee' isn't shown despite being added in https://github.com/openaddresses/openaddresses/pull/6549

Expected behavior I expect that 'us/fl/statewide2' won't be shown in this list and that the download shown for Sarasota County should be from the PR linked above rather than the one from 2022.

Screenshots image

Desktop (please complete the following information):

Additional context It looks like this has been an issue since around Februrary 2022 https://github.com/openaddresses/openaddresses/issues/6009#issuecomment-1048541584 (addresses for 'us/tx/statewide' still aren't showing up in https://batch.openaddresses.io/data)

Seeing last few hundred or so of the sources in this list seem show the latest version, this might be related to weekly runs only going through the last few hundred sources alphabetically (https://github.com/openaddresses/batch/issues/347#issuecomment-1724758350)

Based on https://github.com/openaddresses/openaddresses/issues/6376, this issue seems to also apply to data collections

ingalls commented 1 year ago

Working on figuring out what is going on

bertday commented 11 months ago

Hi @ingalls - really appreciate you taking the time to look into this issue. (I had documented it in a few of the issues @arch0345 referenced above.)

Just curious — did you have a chance to dive in on this? This particular issue has fundamentally changed how I use OA data, so I'd love to be a part of the solution, if there's any opportunity to help 👋

Thank you!

ingalls commented 7 months ago

@bertday This should be fixed as of a couple weeks ago. The Collections updater was failing as it took too long to queue the sources in Batch with a single request. I've split the request up to 50 Job chunks which rectified the issue.

bertday commented 7 months ago

Glad to hear it's all back to normal — I appreciate all the hours you've put into the fix @ingalls !