jpramosi / geckordp

A client implementation of Firefox DevTools over remote debug protocol in python
https://jpramosi.github.io/geckordp/
MIT License
24 stars 8 forks source link

`list_tabs.py` example fails out of the box? #9

Closed goodboy closed 6 months ago

goodboy commented 1 year ago

screenshot-2023-08-11_13-42-32

Looking at that method it seems you actually just return None on failure (which seems semantically confusing as well as not super pythonic) : https://github.com/jpramosi/geckordp/blob/master/geckordp/profile.py#L478

I presume I've got some env setup or firefox thingy done wrong?

goodboy commented 1 year ago

Hah, took me a little while (and some help) to figure out but i guess this triggers a legacy feature i'd never noticed before 😂

https://docs.python.org/3/reference/expressions.html#atom-identifiers

I don't think this should be causing the issue i thought i was seeing inside methods of this class so i'm not entirely sure what's up yet?

goodboy commented 1 year ago

Ok got things working but initial issue was there since there's no geckordp profile that previously existed, the profile clone method always returns None and then the attr error obviously shows up..

@jpramosi not sure if you'd be ok with me putting up a patch for this or not but I think maybe adding some logic to this example to create a new profile when the presumed one doesn't exist would be a sane fix?

For me i had to read source code and pull out a debugger in order to get to the root issue; also using the debug logging..

jpramosi commented 1 year ago

Thank you for pointing out the problems here with the ProfileManager and sorry about the late reply. After reviewing the code, I totally understand your concerns about the bad error handling and the resulting pitfalls. To resolve the issue, I suggest the following to improve geckordp:

In the meantime I will work on this issue, if I got time. If you have found other problems about the profile management or have other concerns, I would be glad to hear from you.

goodboy commented 1 year ago

@jpramosi great!

Totally happy to patch all this in.

On another note (and i'm happy to make a follow up issue for this) would you be interested in a trio or anyio based RDPClient in the code base? I was planning to write one based on your existing asyncio impl but wasn't sure if it was something you'd take internally to this repo

sorry about the late reply.

Abs no worries! I don't think anyone is worse then i at responding to stuff on GH 😂

improve debug logs (for e.g. details like available profiles and path)

How do you feel about adding either some docs to show how to enable more verbose logging, or some more-console-friendly defaults adjustments such that the user can pass for eg. a loglevel='info' (or wtv) flag to be able to see any (hidden) errors when running some of the intro examples?

refactor examples with try-except / fallback logic for ".clone()" to create a new profile

Actually alread did this! I'll put up a patch hopefully today for you to review 🏄🏼

jpramosi commented 1 year ago

I guess your suggestion to use trio/anyio is to make an async version of 'RDPClient' or to make geckordp async compatible. If that is the case, I believe something like a wrapper async function to run sync functions like the methods of an actor class would be ideal. So far I can tell, the client needs also some changes to also work parallel. This would boil down to:

If you would like to tackle this task, I will gladly review and merge changes or any incremental commits. Opening a PR or issue about the task for discussion is no problem.

About the logging part, yes I think the readme documentation (after the actor-hierarchy diagram) can be improved by making an extra header for "Configuration" plus console usage with environment variables. Additionally an extra markdown document in the examples folder, on how to run these examples (with additional hints), could be helpful for the user.

The changes about the examples itself are welcome and appreciated. If you would like you can post your code-snippet here or make a PR.

goodboy commented 1 year ago

I guess your suggestion to use trio/anyio is to make an async version of 'RDPClient' or to make geckordp async compatible.

As far as I read in your current code it looks like RDPClient already is async compat no? At least for the IPC transport? https://github.com/jpramosi/geckordp/blob/master/geckordp/rdp_client.py#L293 I presume you mean an async API so async code can call methods on the client yah?

I believe something like a wrapper async function to run sync functions like the methods of an actor class would be ideal. So far I can tell, the client needs also some changes to also work parallel.

Yeah, i think having your current sync API is fine and the way you've designed it with the asyncio loop running in a bg thread is the way I would have implemented it as well.

  1. keep it compatible
  2. favour fewer changes
  3. prefer pure asyncio
  4. make it possible to receive responses parallel (in different threads) with .send_receive()
  5. implement helper function to run sync functions asynchronously maybe with asyncio.wait_for(asyncio.get_running_loop().run_in_executor(...), timeout_secs) add more tests for the client later

RE 3: The nice thing about using anyio (for both the async API and any wrapper thread which uses it) would be that user code can then choose the async framework of choice and allow pretty flexible usage. I might even go further and say it might be worth the effort to take a so called "sans IO" approach:

for the req-resp msg handling (which i realize is not that much stuff :joy:) and then swapping the async (loop) layer get's a lot simpler / obvious.

RE 5,6: I'm not sure if you need separate threads to do this since you should get decent concurrency just doing the IO / msg handling. In terms of a sync method which delegates to the loop-in-bg-thread, yeah that's also the way i'd expect it to work but if you wanted a batched request - response technique we can probably just use a stdlib queue to block on the sync func caller side.

Anyway, lots to discuss and I need to get a deeper grok of the code base going first. I'll probably try prototyping some ideas with trio/anyio as an alt RDPClient to start and then see how we can unify with the current client. I'll obviously follow up in other issues / PRs with all this async design talk.

jpramosi commented 1 year ago

As far as I read in your current code it looks like RDPClient already is async compat no?

My phrasing was a bit bad here. With async I meant to send and receive messages asynchronously. At the moment the whole I/O operation regarding RDPClient.send_receive() is blocking. Could be a bottleneck if many tabs are used.

I'll probably try prototyping some ideas with trio/anyio as an alt RDPClient to start

It is a bit difficult to follow you here. For geckordp it would an advantage to use only the standard python library to fulfill the goals and to fallback to external libraries if the goal absolutely cannot be reached. For example it would be fine to only use asyncio for the client even it is a tad ugly or bloated. Using an external library, to tidy up code for a few cases without the user benefiting from it, might be a bit pointless. However I don't know how your bigger picture of your proposal look like with the changes of the client and the final result for the user. I don't want to waste your time. I also don't want to dismiss the idea with trio/anyio too soon. If it's possible, a new issue with pseudo code and a practical example of how it improves the public user side in terms of usability or flexibility would be helpful.

goodboy commented 1 year ago

It is a bit difficult to follow you here. For geckordp it would an advantage to use only the standard python library to fulfill the goals and to fallback to external libraries if the goal absolutely cannot be reached. For example it would be fine to only use asyncio for the client even it is a tad ugly or bloated. Using an external library, to tidy up code for a few cases without the user benefiting from it, might be a bit pointless.

In this case the plan is to use a lib i authored (which is a multiprocessing runtime built on trio and structured concurrency) which will more or less be the supervision system for spawning firefoxes which i'd like to manage explicitly from within a native app that allows injection of urls and saving of new pages (normally in new tabs) completely outside of firefox.

With this in mind adding a trio or anyio based rdp client greatly simplifies the wrapping needed to manage an asycio lib; otherwise we end up having to deal with an inter-loop layer to get async tasks synced to one another (yes i know this sounds complicated but it's actually not that bad). I'm also not necessarily proposing putting the trio/anyio client in geckordp itself, was just seeing what you thought about the eventual idea of it 😉

If it's possible, a new issue with pseudo code and a practical example of how it improves the public user side in terms of usability or flexibility would be helpful.

Right so i think the correct approach from my view would be first to start with what you already have (the existing sync API client) and then proto the anyio approach in my own project, see how it looks and then can always give you a view once i'm happy.

So yeah summary is, let's not worry about it too much rn, was moreso fishing to see if you already had a hankering to get away from asyncio 😂