Closed marty-macfly closed 3 years ago
When I'm adding a sleep after the run of the php command in the cronjob, the message become the following:
[2021-01-15 12:28:24,806] [30] [INFO] [StreamServerLayer.py:141] new connect fd:8
[2021-01-15 12:28:24,814] [30] [DEBUG] [FrontAgent.py:90] send hello:8 len:59
[2021-01-15 12:28:24,814] [30] [WARNING] [StreamServerLayer.py:63] peer close socket
[2021-01-15 12:28:24,814] [30] [WARNING] [StreamServerLayer.py:168] client:8 disconnected
[2021-01-15 12:28:24,814] [30] [DEBUG] [StreamServerLayer.py:74] close client:8(('10.131.0.1', 45766))
No more reset but still no trace in UI.
@marty-macfly It looks like the php process was stopped without send all span.
Please share more detail about how does your "cron jop" works.
it's a kubernetes cron job at the given time a container will be startup running the given command line here it's php yii ....
the job run and at the end of the run of the php cli run, the container is removed. Collector agent is in another container and always running and listening on port 8080 for trace.
It works in our env without changing everything.
But, you can
pinpoint_php.SendSpanTimeOutMs= 100 # send span with timeout 100ms
Does it mean it will send a span every 100ms ? So if I enable report I should see something like that:
agent try to connect:(tcp:pinpoint-collector-agent:8181)
#128 pinpoint_end_trace Done!
Every 100ms ?
Does it mean it will send a span every 100ms ? So if I enable report I should see something like that:
agent try to connect:(tcp:pinpoint-collector-agent:8181) #128 pinpoint_end_trace Done!
Every 100ms ?
First, does it meet your requirement? Then, NO. 100ms is time for the waiting_event timeout.
Doesn't change anything still same output of the report and the reset message on collector agent log.
I found your container will be removed every time. Sorry to tell you, current agent can't support.
Before supporting this feature, there are some issues need your help
@marty-macfly
Agent is in a dedicated container which is always running that why I'm not using socket in that case but tcp:port. So agent always running and listening on port 8181.
The cli with the php pinpoint extension is running in a container which is created and destroy every-time and is sending data to agent through tcp on port 8181.
Agent is in a dedicated container which is always running that why I'm not using socket in that case but tcp:port. So agent always running and listening on port 8181.
The cli with the php pinpoint extension is running in a container which is created and destroy every-time and is sending data to agent through tcp on port 8181.
For your description, the collector-agent is OK. The plugins and pinpoint extension not support this feature.
Thank your feedback.
Just a recap of our current setup:
php.ini
configuration for pinpoint-php-module
extension=pinpoint_php.so
pinpoint_php.CollectorHost=tcp:pinpoint-collector-agent:8181
pinpoint_php.SendSpanTimeOutMs=100
pinpoint_php.TraceLimit=-1
pinpoint-collector-agent:8181
.Configuration of collector agent is the following:
[Collector]
ServerType=1500
collector.grpc.agent.ip=pinpoint-collector.pinpoint-uat.svc
collector.grpc.agent.port=9991
collector.grpc.stat.ip=pinpoint-collector.pinpoint-uat.svc
collector.grpc.stat.port=9992
collector.grpc.span.ip=pinpoint-collector.pinpoint-uat.svc
collector.grpc.span.port=9993
collector.grpc.discardpolicy.maxpendingthreshold=10000
[Common]
# Apache webserver port
Web_Port=8080
Log_Level=INFO
LOG_DIR=/var/log/pinpoint-collector-agent
[Agent]
Address=0.0.0.0@8181
pinpoint-collector.pinpoint-uat.svc
on port 9991, 9992, 9993, ...So if I understand properly in that setup it's pinpoint-php-module
that doesn't support it ? Did I properly understood ? Thanks for your help and support.
I'm updating that issue because I've found the following. We've seen that some run were not working as explain in previous comment here. But some use case were working for example in case of CLI running on the same container as the php-fpm process. Or when I've got long running CLI (workers). In both case I was using the agent version 0.4.2 in python with php extension sending data through unix socket.
We've update the agent to the Go version of the agent and some use case that was working before are not working anymore. For example long running CLI and CLI run inside the php-fpm container. Difference is we are now using tcp connection to send data from the PHP extension to the GO agent because Go agent is not supporting socket.
After some investigation I've seen the following when testing on my local laptop CLI tracing are still working but when testing on test environment where agent and PHP extension are on different host it's not working anymore but php-fpm tracing work. when I'm running CLI in debug I've seen the following difference from local and test env on loca:
CLI debug report is:
[pinpoint] [72] [72]this span:(....)
[pinpoint] [72] [72]agent try to connect:(tcp:192.168.254.1:8088)
[pinpoint] [72] [72]fd 5 send size 579
[pinpoint] [72] [72]#128 pinpoint_end_trace Done!
On test env were agent and CLI is not on the same host trace is the following:
[pinpoint] [72] [72]this span:(....)
[pinpoint] [72] [72]agent try to connect:(tcp:192.168.254.1:8088)
[pinpoint] [72] [72]#128 pinpoint_end_trace Done!
We can see that we don't have the fd...
debug message, I've put some more debug in the php extension and I've found that the TransLayer::trans_layer_pool()
is going into else
statement which is comment with timeout do nothing
int retval = select(fd+1,&rfds,&wfds,&efds,&tv);
so the return value of the select is 0, I've check the documentation of select
So select behavior is based on timeout the doc is saying the following
The timeout argument is a timeval structure (shown below) that specifies the interval that select() should block waiting for a file descriptor to become ready. ... If both fields of the timeval structure are zero, then select() returns immediately. (This is useful for polling.)
In our case timeout is define as follow:
struct timeval tv = {0,(int)timeout *1000};
So I add more debug and in fact timeout
is NULL
so both fields are 0 so select is not waiting and returns immediately. If I hardcode the timeval with struct timeval tv = {5,(int)timeout *1000};
select will wait for maximum 5 seconds and after timeout.
In that case my trace is properly send and my debug report for CLI on local and on test env is the same. In your first comment you said to set pinpoint_php.SendSpanTimeOutMs = 100
which seems to be the proper solution to make things work but when I've search into the code I don't see any declaration of that configuration element in the php ini entry list STD_PHP_INI_ENTRY and anywhere in the code so I'm wondering if that directive still usable ?
@marty-macfly
please check #350
pinpoint_php use asynchronous io, so the tv
is {0,0}. It only send when available.
I rollback pinpoint_php.SendSpanTimeOutMs feature
: pinpoint_php.SendSpanTimeOutMs > 0 , waiting until ms expired or event callback.
I'll test that, and revert to you
pinpoint_php use asynchronous io, so the
tv
is {0,0}. It only send when available. I rollbackpinpoint_php.SendSpanTimeOutMs feature
: pinpoint_php.SendSpanTimeOutMs > 0 , waiting until ms expired or event callback.
May be in case of CLI when doing remote tcp connection the async io is not done before the process is ended ??
Anyway with the change all my use case are working, so for me that change is fine with new GO version of agent and php-fpm, long running process and cli.
So it would be great to have it in master :)
Thanks a lot
I've seen the merge on master so I'm closing that issue, thanks.
I'm running a PHP cron job on Kubernetes which ending with the following trace:
On collector agent which is
pinpoint-collector-agent:8181
v0.4.1, I've got the following:And not trace is sent to the UI.