Closed t2gran closed 3 years ago
We are done with this in the Entur 1.x fork, and I will prepare a PR for OTP2 on this. I think this is really useful in a lot of situations. If not configured the implemented solution work the same as today (a few exceptions are listed below). But, by using the build-config.json
it is possible to specify URI(s) for each file type (OSM, DEM, GTFS, NETEX, HTML BUILD REPORT(annotations), BASE GRAPH, GRAPH, OTP-STATUS). The otp-status file is new and allow other components to check on the status of the build process - using a synchronization file. We have added support for Google Cloud Storage and using file URLs. It would be easy to add AWS and HTTP support as well - any API witch support a catalog of files that can be streamed will be easy to support.
When zip files are streamed to OTP, OTP keeps the entire file in memory for the duration of processing the file, this is due to the fact that OTP access GTFS and NeTEx data in a random access order. When using the local file system, OTP still uses a random-access-file (ZipFile) to access it - not copying everything to memory. This should not be a problem, but if it is there are simple ways (using a local file cache) to fix it.
I will prepare a PR containing the refactoring of OTP, which create the necessary extension points to make "store plugins". I will also prepare a PR with the Google Cloud plugin as a Sandbox module.
Breaking changes There are a few minor breaking changes:
New features
otp-status.inProgress
, when OTP exit this file is renamed to otp-status.ok
or otp-status.failed
. In certain ecosystems this make it easier to construct automatic build pipelines. I will do a separate PR for this one. Put on hold
until 1. June 2021
If there is no demand for the two reminding features before the date, we will close the issue.
The 2 reminding features:
For both of these, if someone need this, and provide the resources to test it, then I can help with providing the implementation. Support for status file PR: #2911.
Today OTP reads all its input from the local file system and writes the graph to the same local disk. This is ok in a static deployment, but in a cloud deployment it creates some overhead of moving files from permanent storage into a cluster node and back when the process (OTP) is done. When the OTP process fails or freeze it is difficult to detect, and make the system less robust.
In a continuous automated devops ecosystem we would like OTP to directly integrate with the rest of the system, not having to wrap OTP and copy files around. We want to change the in/out-put files to read and write directly to the cloud storage and to track the progress. This is most relevant when building a graph, but a solution should, if possible, not be limited to that.
At Entur we need to change the current way we do this, so we will implement support for pluggable file access, so the default (current) way this work can be switched to accessing Google Cloud storage.
We post this issue here to let people know we do this in our private fork, and if there is an interest for this, we can make a PR to integrate this into OTP2.
We are not going to introduce dependencies to any Google Cloud specific libraries, just provide a pluggable extension point in OTP to swap in an alternative implementation. We will provide links to our GCS implementation in the Entur GitHub repo, if someone want to copy/use it.