public-transport / gtfs-via-postgres

Process GTFS Static/Schedule by importing it into a PostgreSQL database.
https://github.com/derhuerst/gtfs-via-postgres#gtfs-via-postgres
Other
92 stars 18 forks source link

Problem using Docker image #36

Closed jusabatier closed 1 year ago

jusabatier commented 1 year ago

I'm trying to import a GTFS Zipfile into my Postgresql Database, but I get this error :

Error: task "shapes" is not defined at sequence (/app/node_modules/sequencify/index.js:14:9) at sequence (/app/node_modules/sequencify/index.js:38:5) at convertGtfsToSql (/app/index.js:100:2) at convertGtfsToSql.next () at pumpToNode (node:internal/streams/pipeline:132:22) at pipelineImpl (node:internal/streams/pipeline:373:9) at pipeline (node:internal/streams/pipeline:183:10) at Object. (/app/cli.js:115:1) at Module._compile (node:internal/modules/cjs/loader:1275:14) at Module._extensions..js (node:internal/modules/cjs/loader:1329:10) { missingTask: 'shapes', taskList: [ undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined ] }

Here is my Dockerfile :

FROM publictransport/gtfs-via-postgres

RUN apk add --no-cache postgresql-client

WORKDIR /gtfs

# pass all arguments into gtfs-via-postgres, pipe output into psql:
ENTRYPOINT ["/bin/sh", "-c", "env | grep PG; gtfs-via-postgres $0 $@ | psql -b"]

And I run it with :

docker build -t import-gtfs .
docker run --rm --volume /tmp/gtfs:/gtfs -e PGHOST=$PGHOST -e PGPORT=$PGPORT -e PGDATABASE=$PGDATABASE -e PGUSER=$PGUSER -e PGPASSWORD=$PGPASSWORD import-gtfs --require-dependencies -- stops.txt

The docker container connect successfully to the database, but i get the error above.

Can someone help me with this ?

derhuerst commented 1 year ago

I know this is somewhat unintuitive, but currently, if you have a dataset without shapes.txt/shapes.csv, you need to run gtfs-via-postgres with --trips-without-shape-id.

Please try if that helps.

jusabatier commented 1 year ago

My dataset contains a shapes.txt file.

When I tried to add the option you suggested, I get another similar error :

Error: task "trips" is not defined at sequence (/usr/local/lib/node_modules/gtfs-via-postgres/node_modules/sequencify/index.js:14:9) at sequence (/usr/local/lib/node_modules/gtfs-via-postgres/node_modules/sequencify/index.js:38:5) at convertGtfsToSql (/usr/local/lib/node_modules/gtfs-via-postgres/index.js:100:2) at convertGtfsToSql.next () at pumpToNode (node:internal/streams/pipeline:132:22) at pipelineImpl (node:internal/streams/pipeline:373:9) at pipeline (node:internal/streams/pipeline:183:10) at Object. (/usr/local/lib/node_modules/gtfs-via-postgres/cli.js:115:1) at Module._compile (node:internal/modules/cjs/loader:1275:14) at Module._extensions..js (node:internal/modules/cjs/loader:1329:10) { missingTask: 'trips', taskList: [ undefined, undefined, undefined, undefined, undefined, undefined, undefined ] }

derhuerst commented 1 year ago

Are you passing the zip archive as an argument? You need to unzip it first and pass the individual files.

I will adapt the readme to make this more clear.

jusabatier commented 1 year ago

Ok with your indications, I found that the files are not mounted in my running container.

My usecase is to periodically import GTFS datasets into my PostgreSQL database using a planned task in a Gitlab CI.

Here is my conf :

Dockerfile :

FROM node:alpine

RUN apk add --no-cache postgresql-client
RUN npm install -g gtfs-via-postgres

VOLUME /gtfs
WORKDIR /gtfs

ENTRYPOINT ["/bin/sh", "-c", "npm exec -- gtfs-to-sql $@ | psql -b"]

.gitlab-ci.yml :

variables:
  SHARED_PATH: /builds/$CI_PROJECT_PATH/shared
  DOCKER_RUN_ENV_OPTS: "-e PGHOST=$PGHOST -e PGPORT=$PGPORT -e PGDATABASE=$PGDATABASE -e PGUSER=$PGUSER -e PGPASSWORD=$PGPASSWORD"

stages:
  - update

update-job:
  stage: update
  image: docker:stable
  services:
    - docker:dind
  script:
    - rm -rf $SHARED_PATH/gtfs
    - mkdir -p $SHARED_PATH/gtfs
    - echo "Downloading GTFS files"
    - wget --no-check-certificate https://yoururl.com/download/GTFS_DATASET.zip -O $SHARED_PATH/gtfs/gtfs.zip
    - echo "Downloading complete."
    - echo "Extracting files"
    - mkdir $SHARED_PATH/gtfs/extracted
    - unzip $SHARED_PATH/gtfs/gtfs.zip -d $SHARED_PATH/gtfs/extracted
    - echo "File extracted in $SHARED_PATH/gtfs/extracted"
    - echo "Building docker image"
    - docker build -t import-gtfs .
    - echo "Image built"
    - echo "Extracting GTFS to Postgresql"
    - docker run --rm -v $SHARED_PATH/gtfs/extracted:/gtfs $DOCKER_RUN_ENV_OPTS import-gtfs -d -u -- *.txt
    - echo "Done"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule" && $SCHEDULE_NAME == "gtfs-update"'

Then define PG vars in the project CI/CD variables.

This way it work well. Thanks for your help.