Open pielambr opened 1 year ago
Hi @pielambr, we've noticed this as well. I wouldn't be surprised if it's related to https://github.com/pelias/docker-libpostal_baseimage/pull/12. There have been some changes lately in libpostal that cause segfaults. While some may be fixed it's very likely some issues still remain.
We can try reverting to an older commit of libpostal again, stay tuned.
I think we might be able to move back up to HEAD since https://github.com/openvenues/libpostal/pull/632 was merged, I'll try building and releasing a new docker image tomorrow.
@pielambr do you have an example query which caused the fault which I can use to confirm the fix?
@missinglink I'm afraid not, we just observed it in production that the pod went down quite often, usually with larger paragraphs of text.
It seems the latest docker image already includes code from the PR I linked above.
@pielambr can you please tell me which version of the docker image you are running?
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
pelias/libpostal-service latest 846cd5bdb6db 9 days ago 2.3GB
Could you please add some instrumentation to capture the query causing the segfault if possible?
From what I'm seeing here it's difficult to resolve this issue without knowing which version(s) and which query(ies) are causing it.
After some trial and error I was able to get 846cd5bdb6db
to segfault by increasing the input query length, this is the query which finally caused it to fail on my machine:
30 w 26th st, new york, ny,30 w 26th st, new york, ny,30 w 26th st, new york, ny,30 w 26th st, new york, ny
What I'll do is to revert to the last known stable version and write up an issue on the libpostal repo to make them aware, it seems to be affecting HEAD
so maybe there was a regression introduced.
Okay, bad news, I rebuilt this image pinned to an older version of our libpostal baseimage and I was still able to trigger the segfault by sending 5 to 10 long ugly queries like the one above.
diff --git a/Dockerfile b/Dockerfile
index c91a18c..5c85161 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,5 @@
# build the libpostal-server binary separately
-FROM pelias/libpostal_baseimage as builder
+FROM pelias/libpostal_baseimage:pin-to-version-that-builds-2023-07-04-5f89119a11fbcce5df475eba9a3f337181d2d8ad as builder
RUN apt-get update && apt-get install -y make pkg-config build-essential
It's not clear when exactly the regression was introduced but I checked out an old version from 2021-11-03
and it isn't affected, so that can provide a bookend for the bisect.
I don't have loads more time to spend on this today but if someone could provide more information about which versions between master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e
and master-2023-07-23-c289dda8d47cb6d21b2a1aa74e68cb5e9d12a872
work or don't work that would be super useful to getting this resolved 🙏
docker run -d -p 4400:4400 pelias/libpostal-service:master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e
In fact there have been fairly few releases since 2021 due to not much activity on the upstream repos:
If I have some spare time, I'll have a look at what version introduced it for us, but that might be a while. We currently reverted all the way to version ca4ffcc
just to make sure, because it was blocking production.
Hi folks, I am seeing this issue as well. I've tried the following images:
master-2023-07-23-c289dda8d47cb6d21b2a1aa74e68cb5e9d12a872 <- crash master-2023-07-16-d6483672db70596a2ee0d97782567b12917c6ae6 <- crash master-2023-07-04-b02f6f14cfe2dbf2dfee9e458a372f0aca13caa4 <- no crash master-2021-11-03-aaf0586c78acd54e4586d84e6257c56b9db99f3e <- no crash
I haven't done a huge amount of testing, but the crash is pretty easy to reproduce, occurring after about ~500 requests. The 2023-07-04 image appears to be the latest that is holding up for 1000s of requests in my environment (Kubernetes with 4Gi mem limit).
Thanks for the continued reports, they are helpful to discover which versions are affected.
These memory issues are being discussed over on the main libpostal issue tracker and we hope to adopt the patches as soon as they are available.
We would be happy to accept some code in this repo which could reliably cause the CI to crash (and therefore docker images not created) so that no new releases could be generated until it is fixed upstream.
@missinglink perhaps my message was formatted a little confusingly. The crashing images that I've tested are:
The images which I've tested that appear stable are:
Got it thanks 👍
Describe the bug
Since updating to the latest version, which includes the bumping of the Ubuntu version, we are repeatedly getting segmentation violations - around one every 6 to 10 minutes
This goes away when reverting to an earlier version.
Steps to Reproduce
latest
tagExpected behavior
No segmentation violations
Environment (please complete the following information):
The container is running inside of a Kubernetes cluster on Google Cloud Services
Pastebin/Screenshots
Additional context
References