Open iferencik opened 2 years ago
If this is on road map,can i try this?
Hi @sharkAndshark,
Sure!
Based on the discusion in https://github.com/maplibre/martin/issues/349 ,there will be no watch mode? @stepankuzmin
@sharkAndshark I think we may need to rethink the watch mode, and possibly add it back before the next release.
read
of a read/write lock, and copy the latest state to the worker's state". Not certain if this is doable though./rpc/getmvt.json
, the tile path should be /rpc/getmvt/<version>/<z>/<x>/<y>.pbf
, where the <version>
is 1,2,... This path should be dynamic, and change whenever a new version is detected. Only applies when it is possible to access different versions at the same time. Note that this approach would only work if there is only one instance of Martin, which is not scalable.That said, I really feel like the more substantial changes should be done with kubernetes and helm and deployments, and the watch
mode is a development hack, making infrastructure too fragile. A better set up would be:
@nyurik I will give it a a shot and then more discussion
Sounds good! @stepankuzmin any thoughts?
@sharkAndshark sure, that would be great 👍
@nyurik We are currently using the latest version of martin, and there are some real benefits in using this version. But the only thing that is stil a downside in the new version, is the remove of the watch or the option to reload the index. We are running on a Kubernetes cluster and we reload the index by creating a new pod, and killing the other one. But this process can take up to 50 seconds or so, that's the time the user has to wait before, before being able to see the updated data. So is there a possibility to add another way to refresh the data? Perhaps we can try to help with implementing this feature?
@pascal-codetaal I'm beginning to think we really do need to reimplement it, and some help would be great. So lets have a dedicated POST /reload
endpoint.
My only concern is that we would want to avoid a global lock on each request. I'm not yet certain how to do that best.
@nyurik We are currently using the latest version of martin, and there are some real benefits in using this version. But the only thing that is stil a downside in the new version, is the remove of the watch or the option to reload the index. We are running on a Kubernetes cluster and we reload the index by creating a new pod, and killing the other one. But this process can take up to 50 seconds or so, that's the time the user has to wait before, before being able to see the updated data. So is there a possibility to add another way to refresh the data? Perhaps we can try to help with implementing this feature?
@pascal-codetaal if you do a
kubectl rollout restart deployment <depl_name>
there is no downtime. kube will ensure the old pod(s) is/are working while the new one gets deployed.
Just make sure you have enough capacity on the node to spin a new deployment.
Also, you should ideally have more(>1) replicas.
I have the same issues and being under time pressure I resorted to something else I created a sh [script] (https://github.com/UNDP-Data/martin-docker/blob/main/reactive_entrypoint.sh) that checks the local config config file against a remote one located at a web accessible url (Azure file share).
An env vars specify how often the url/config is pulled and where the URL is pulled from.
This might not be very elegant but it works like a charm specially because i use a tool to create (optionally upload to azure) the config files. We actually update the layers on a daily, sometimes hourly basis. In future i will pool the remote config once a day.
The tool has a quirk that it creates configs only for tables that have the "publish=True" in the table description. The same is valid for columns, if you mark a column with publish=False the toll will skip it. This is not a very good idea and I think I will scrape it in favor of a more robust technique like revoking read rights for the user under which martin connects to the database.
We don't have a downtime, it just can take up to 50 seconds before the new pod is up and running... We let users upload data to our db and they don't like to wait an extra 50 seconds before they can see the updated data. I'm going to check your script, thx
I would love to be able to reload the schema config without a full restart. The PostgREST project has two cool ways to do this:
https://postgrest.org/en/stable/schema_cache.html#schema-cache-reloading
The second method in particular is interesting because it allows the DB admin to set up custom triggers that would send notifications to the custom channel any time a geospatial table changes.
There are currently two issues I see (i'm sure there are more):
hash(server_name, server_port, db_name, namespace, table, geo_column)
, and users can set it with id_format: '{table}.{hash}'
in the auto_publish
settings.We just ran into this issue after switching from pg_tileserv to Martin. With pg_tileserv all new tables where discovered right away. Now, we have to restart our Martin Docker container to get an updated table list.
Our docker setup https://github.com/FixMyBerlin/atlas-geo/blob/main/docker-compose.yml starts our backend server with db, processing and tiles. It looks like we don't have a good way to restart the tile docker after the app image closed and finished processing our data…
Is there some API like <marinserver.com>/refresh
that we could ping to notify Martin to refresh tables?
@tordans we all want that api to happen - I plan to work on it in the future, but no certainty on the timeframe -- it is mostly a personal project, so if anyone has any time to work on it, I will be happy to guide and assist that effort.
Hello @nyurik and the Martin contributors,
We've been closely following the discussion about the need for a watch mode or similar feature to enable dynamic config reloading in Martin. As we understand the significance of this feature for users, and us, who need to see updated data without substantial downtime, we're keen on contributing to its development.
Though we're not currently well-versed in Rust, we're ready to put in the necessary time to become proficient and contribute meaningfully to this project. We're particularly interested in understanding the preferred approach.
Could you provide us with some guidance on where you believe we should start, or if there are any particular areas of the codebase or documentation that we should familiarize ourselves with first?
We're looking forward to collaborating with the community on this feature. Thank you for considering our offer to help.
@pascal-codetaal I will be happy to help you.
The very first thing I think we should evaluate is the performance aspect outlined in https://github.com/maplibre/martin/issues/288#issuecomment-1494739125 -- create a tiny Actix app using one of the examples they have as a starting point. Inside that app, create two endpoints, e.g. /lock
and /nolock
to see the impact of the global lock. You can use the oha
tool (used in the justfile) to compare the performance of both endpoints.
This should get you familiarized with the Rust ecosystem, making it much easier to make changes to a bigger project.
#[derive(Clone)]
struct State {
pub values: HashMap<String, String>,
pub values_mutex: Arc<Mutex<HashMap<String, String>>>,
}
impl State {
pub fn new() -> Self {
let values = vec![(String::from("key"), String::from("value"))]
.into_iter()
.collect::<HashMap<String, String>>();
Self {
values_mutex: Arc::new(Mutex::new(values.clone())),
values,
}
}
}
#[get("/lock")]
async fn get_lock(state: Data<State>) -> String {
let vals = state.values_mutex.lock().unwrap();
vals.get("key").unwrap().clone()
}
#[get("/nolock")]
async fn get_nolock(state: Data<State>) -> String {
state.values.get("key").unwrap().clone()
}
@nyurik the development will be done by @mesudBe-Orbit He will start with the development of the separate Actix application, and he will push this to a public repository.
@nyurik @pascal-codetaal the development is complete, and you can access the tiny application and view the results by following this link: https://github.com/mesudBe-Orbit/actix-lock-stress-tester?tab=readme-ov-file#test-results In short, under these conditions, it seems there is no significant difference.
awesome news, thanks! So this should make it easier.
The current config system process:
start()
, read config file if given, using current env vars for any variable expansions..resolve()
to match the configuration to the actual state of the database/files, and create a ServerState
(tile sources, fonts, etc)
https://github.com/maplibre/martin/blob/a864b9981c9b8cfeaf3b4a90d40ec2635674bd59/martin/src/bin/martin.rs#L15-L27The resolve
method creates cache and an ID Resolver, and uses resolve_tile_sources
to process all tile sources in parallel.
IdResolver
maps "unique full source IDs" (like database.namespace.tablename
or even longer) to some short name the user may want, like tablename
. If the short name already exists, it will append numbers to it, e.g. tablename.1
, but if the same full unique name is passed to IdResolver, it will re-return the same short name as before - so my point above about stable resolution is mostly done - same table will get the same short name.
PostgreSQL resolution is much more involved than pmtiles/mbtiles. It runs these steps in parallel for all configured postgres connections:
PgBuilder
instance from the PgConfig
instance.query_available_tables
)instantiate_tables
)So one path to implement watch mode:
sources: Data<RwLock<TileSources>>
instead of Data<TileSources>
, and get a read lock on all of them. Note that RwLock
might not be good for our usecase, but it should get us started.Config
struct, store it (as Arc<Mutex<Config>>
) inside the actix's Data<...>
/refresh
endpoint that gets Data<Arc<Mutex<Config>>>
from Actixresolve
to compute new set of sources, and once done, lock and update all tile sources using the write of the RwLock.Config::resolve
should be changed to preserve IdResolver instance between calls - e.g. move it to constructor?@mesudBe-Orbit @pascal-codetaal the development is complete, and you can access the tiny application and view the results by following this link: https://github.com/mesudBe-Orbit/actix-lock-stress-tester?tab=readme-ov-file#test-results In short, under these conditions, it seems there is no significant difference.
Please run your code with cargo run --release
, and also you should probably increase the number of requests, e.g. aim for about 5 minute run (?)
@nyurik the 5 minute run had the same result for both endpoints. We started with the implementation of changing the endpoint functions.
Change all endpoint functions to accept sources: Data<RwLock
> instead of Data , and get a read lock on all of them. Note that RwLock might not be good for our usecase, but it should get us started.
@pascal-codetaal sounds good. One other thing - I would recommend creating a Martin fork, create a branch there, and create a draft
pull request into martin. Make sure to commit and push your code often (even if it does not compile). This way everyone can see the progress, and make suggestions as you go, without waiting for the full completion, and to correct mistakes early on. This way you will also get the benefit of constant CI checks.
@nyurik we have created a draft PR into martin. you can find it here And commit and push our latest code (the latest commit was about mistakenly added repetition).
@mesudBe-Orbit yes, thanks, i saw it and already commented on that PR. Keep adding things to it (by pushing to the same branch), and once it is ready to be merged, click the Ready to review
button. Make sure to either merge main branch into it, or rebase on main and force push.
pmtiles support for this would be good as well. I outlined how it can be done in https://github.com/stadiamaps/pmtiles-rs/issues/40 - and once done, we can use it in Martin too.
Hello there
I'd like to know if setting WATCH_MODE=true makes martin reload the config file. I am publishing many postgis layer with even more properties/columns and in order to limit these i need to use a configuration file. I am thinking to create a python script that will generate the config yaml and it would be really nice if martin could re-read this in WATCH_MODE
thank you for this great software box