Open JensRichnow opened 7 months ago
Update on the above:
grafana-wtf.sqlite
db)Segmentation fault
during the fetch process. I might have missed this error the first time ...Segmentation fault: 11 | 27/578 [00:02<00:38, 14.16it/s]
/Users/jens/.pyenv/versions/3.9.12/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Dear Jens,
thank you for writing in, and apologies for the late reply.
grafana-wtf was conceived as a sloppy tool fulfilling some of our needs and to make use of grafana-client the other day. It looks like we did not get the code right when bringing concurrent HTTP downloads together with the SQLite-based request cache -- it apparently "just worked" for us and others up until now that you are reporting this flaw.
We will have to look into it where corresponding locks would be missing, that will prevent accessing shared resources concurrently. In the meanwhile, you may be successful by using the --concurrency=
command-line option, in order to decrease download concurrency? Please let us know how this goes.
With kind regards, Andreas.
There are a few discussions on the issue tracker of the requests-cache
package about this topic.
In general, they indicate the SQLite-based caching has limits when it comes to high concurrency situations, but there are also specific recommendations which may need to be applied to the grafana-wtf code base. If you have the capacity to dive into the details, we will be more than happy.
Now that it trips at a concurrency of "5" already, indicates that either the code of grafana-wtf
would need to be improved, as mentioned above, or that you just discovered something in this area which will trip on M2 machines much more likely, even with lower concurrency. In this case, it may be relevant to report this to the requests-cache
developers as well.
Another recommendation I've picked up from quickly scanning those discussions would be to use a different cache backend. Please let us know if you have such needs, we think it will be easy to prepare grafana-wtf to offer different cache backends on behalf of requests-cache
.
There is another option we may want to consider.
grafana-wtf --drop-cache
Some people are also using that option in order to emulate a --no-cache
behavior.
Maybe it is also the right time to finally offer such an option, so that grafana-wtf
can also be more easily used as a library, without doing too much like making the request caching obligatory.
In this spirit, when implemented, --no-cache
should turn off any caching at all, eventually resolving your issue in a different manner.
Hi again,
taking all this into consideration, specifically @JWCook's comment at https://github.com/requests-cache/requests-cache/issues/870#issuecomment-1716370826, ...
The solution is to use
CachedSession
instead, which is thread-safe.
... and reviewing grafana-wtf
's code at the spots where the cache enablement and the concurrent downloads are happening, we can see it does not use CachedSession
at all.
So, we figure a good first measure would be to actually use request-cache
's CachedSession
, which may improve the situation right away, when we are lucky?
With kind regards, Andreas.
It looks like the actual requests are sent by the grafana-client
library, so I'm guessing that's why install_cache()
is being used here. That's a convenient way to patch an external library you don't control, but as you found, it's not thread-safe.
I don't think you need to switch backends; SQLite should be more than able to handle this. For reference, though, there are some recommendations on choosing a different backend here.
Thankfully, there is a requests.Session
object stored on GrafanaClient.s
. You can replace that with a requests_cache.CachedSession
object, which would look something like this:
api = GrafanaApi(...)
session = CachedSession(cache_name=__appname__, expire_after=expire_after, use_cache_dir=True)
session.headers['User-Agent'] = api.client.user_agent
api.client.s = session
Good luck! Feel free to open an issue or discussion on requests-cache if you get stuck.
Thanks @amotl ! Makes perfectly sense and setting--concurrency=1
for the time being worked nicely!
Hi there,
thank you both for your excellent replies. Jordan, the suggestion to patch GrafanaClient
accordingly sounds good, and at the same time, I am happy to hear that Jens has been able to use --concurrency=1
as a workaround.
Cheers, Andreas.
Hi again. There is a patch now, which may improve the situation.
Dear Jens and Jordan,
grafana-wtf 0.19.0 has been released, including corresponding patches to improve the situation. Can you tell us if it works better for you now? Thanks for your support!
With kind regards, Andreas.
Issue still persists with 0.19.0
Dear Edgaras,
thank you for writing in. Do you think grafana-wtf should start providing an alternative cache backend, which is less problematic in concurrent access situations?
@JWCook and @Ousret: Can you spot any obvious problems with the current implementation after improving based on your feedback? Maybe we are still doing things unfortunate or even completely wrong?
With kind regards, Andreas.
@amotl I looked over your changes and they look good to me. This issue could potentially be fixable with some backend settings, for example SQLite write-ahead logging. This comes with a few tradeoffs, but notably it allows read operations to not block writes. This can be enabled by passing wal=True
to CachedSession
.
It's also worth mentioning that I can't make any guarantees about niquests' compatibility with requests-cache. From a quick look at its Session class, it should behave the same as requests.Session
, but it may or may not remain that way.
I can help debug further, but I'll need some more information. @edgarasg can you provide the exact error message you're getting, and post a complete example that reproduces this issue? Are you getting exceptions, or just log warnings? And @amotl, roughly how many threads are making concurrent requests? Is there a user-configurable thread limit, automatic based on the CPU, or other?
Dear Jordan,
thanks for your swift reply. We used your suggestion about wal=True
right away.
grafana-wtf 0.19.1 includes this improvement. May we ask you to use that version for proceeding into further tests, @edgarasg?
It's also worth mentioning that I can't make any guarantees about niquests' compatibility with requests-cache. From a quick look at its Session class, it should behave the same as
requests.Session
, but it may or may not remain that way.
This is right on the spot. There could be something fishy, but up until further notice, I am also assuming "it will just work". Do you have any idea why that could go south in concurrency situations, because of unfortunate side effects because Niquests is async, @Ousret?
I can help debug further, but I'll need some more information.
Actually, I didn't mean to bother you too much with that issue, so I appreciate your reply and offer very much. Let's hear back what @edgarasg might report about 0.19.1 before conducting another round of investigations.
How many threads are making concurrent requests? Is there a user-configurable thread limit, automatic based on the CPU, or other?
The default value for the concurrency setting is 5
. It can be adjusted by using the --concurrency
command-line argument.
With kind regards, Andreas.
It's also worth mentioning that I can't make any guarantees about niquests' compatibility with requests-cache. From a quick look at its Session class, it should behave the same as requests.Session, but it may or may not remain that way.
Niquests advertise itself as a drop in replacement. It would lost his purpose if this was "broken" or if it diverged too much. We are making sure it would work.
This is right on the spot. There could be something fishy, but up until further notice, I am also assuming "it will just work". Do you have any idea why that could go south in concurrency situations, because of unfortunate side effects because Niquests is async, @Ousret?
Niquests is thread safe in a synchronous enclosure and task safe in asynchronous. So nothing to be concerned there. I believe you don't do async in grafana wtf. So the "thread safe" aspect apply. And yes, it should "just work". We keep close contact in case there's a need to improve.
regards,
Thanks for swift response. Now I'm getting such error:
❯ grafana-wtf find Phj18H0nk --drop-cache
[grafana_wtf.core ] INFO : Fetching dashboards in parallel with 5 concurrent requests | 0/479 [00:00<?, ?it/s]
[1] 9213 segmentation fault grafana-wtf find Phj18H0nk --drop-cache
/Users/edgarasg/.pyenv/versions/3.11.6/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Sorry to hear, and thanks for reporting. Please also use --concurrency=1
as a workaround, until this problem will be resolved properly.
I'm using v0.17.0. When running
grafana-wtf --drop-cache find xyz
I get a DB lock errror:Running on Apple M2 Max.
How can I recover from that? Cheers