openaddresses / batch

OpenAddresses/Machine based AWS Batch based ETL Processing
https://batch.openaddresses.io/
MIT License
6 stars 5 forks source link

Some sources are missing from /data #370

Closed bertday closed 6 months ago

bertday commented 10 months ago

Describe the bug Some data sources are missing from https://batch.openaddresses.io/data

To Reproduce Steps to reproduce the behavior:

  1. Go to https://batch.openaddresses.io/data
  2. Search for us/ga/washington

Expected behavior us/ga/washington should appear in the list of sources. See screenshot below where it is missing.

Screenshots image

Additional context us/ga/washington has a source definition here:

https://github.com/openaddresses/openaddresses/blob/master/sources/us/ga/washington.json

added by @arch0345 earlier this year.

ingalls commented 10 months ago

Spent way to much time trying to reproduce locally, should have checked the logs first

1
2023-12-01T12:04:08.458Z
Error: 504: Unexpected token < in JSON at position 0 - TypeError: Body is unusable
2
2023-12-01T12:04:08.458Z
at run (file:///usr/local/src/batch/node_modules/@openaddresses/lib/src/run.js:61:23)
3
2023-12-01T12:04:08.458Z
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
4
2023-12-01T12:04:08.458Z
at async OA.cmd (file:///usr/local/src/batch/node_modules/@openaddresses/lib/oa.js:113:16)
5
2023-12-01T12:04:08.458Z
at async cli (file:///usr/local/src/batch/sources.js:83:9)
ingalls commented 10 months ago

Seeing the following behavior

@iandees

ingalls commented 10 months ago

image

ingalls commented 10 months ago

Expensive SQL Query:

                SELECT
                    count(*) OVER() AS count,
                    runs.id,
                    runs.live,
                    runs.created,
                    runs.github,
                    runs.closed,
                    ARRAY_AGG(job.status) AS status,
                    COUNT(job.*) AS jobs
                FROM
                    runs
                        LEFT JOIN job
                        ON job.run = runs.id
                WHERE
                    $1::T
bertday commented 9 months ago

Hi @ingalls, really appreciate you taking a look into this issue of some sources missing from the /data page. Just curious if you found any possible culprits, or if there's a way for the OA community to tag in on this one!

ingalls commented 9 months ago

Spent several hours yesterday trying to get this unblocked. I think we're at a point with due to the volume of sources and overhead in queueing them, I will do the following

bertday commented 6 months ago

Hello @ingalls ! I know this was one was a doozy — how did things end up turning out with the re-architecture? Did you have any luck getting the missing sources added back? Are there any opportunities for OA volunteers to tag in?

Thanks again for looking into this one!

ingalls commented 6 months ago

@bertday I got it up and running a couple weeks ago. Is there a specific source you are seeing that is passing but not added to the main sources list? Happy to take a look at a specific source now that the weekly source processor is working.

bertday commented 6 months ago

@ingalls that's great to hear! I just checked /data and the things I was expecting are showing up now 🎉

I'm still seeing some sources missing from the map, but I'll follow up over in ~#346~ #345.

Thanks again!