ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero
https://secretagent.dev
MIT License
667 stars 44 forks source link

Help us to help you: please consider improving community standards #433

Open GlenDC opened 2 years ago

GlenDC commented 2 years ago

Currently the Contributors guidelines are missing. A code of conduct is also missing, but that's more of a minor issue given the communication seems so far already be pretty nice and friendly.

It currently is also not clear if one should contribute to Hero or to Secret-Agent, given the first is the successor of the latter, so where one's effort would go best too? The documentation related to the core is also pretty thin. Issues could use some labeling as well for issues for which help is required, or which are good to pick up for first-time contributors.

Thank you for putting out this project though and have a nice day :)

GlenDC commented 2 years ago

On top of that it could also be nice if you would be able to expose some kind of roadmap, high level and vague as it might be. So that at least there is a sense of timings and direction these projects are going. Knowing the end-game or long-term plan for Secret-Agent might help as well. I also find no documentation on why the split to Hero rather than sticking with Secret-Agent.

My apologies for the amount of questions, trying to make things clear for myself and others, and I hope it doesn't come off in any wrong way. I really do appreciate what you're doing with this project.

blakebyrnes commented 2 years ago

Thanks @GlenDC! We actually have Code of Conduct and Contributors guidelines buried on the website, but they're almost useless for actually contributing at the moment. Particularly since we switched to our Hero branch.

If you are looking to contribute (which is awesome!!), I would say bug fixes in SecretAgent are super helpful. We are porting fixes into Hero (and vice-versa) until Hero stabilizes. The feature additions are a bit more difficult to drive from SecretAgent side, but if something sparks your interest, we can definitely chime in on the ticket as to whether or not we should pursue it.

Core documentation is sparse mostly because we had assumed users would generally not use it (when we were thinking through the initial docs). Plugins have changed that truth to some degree, and this is certainly not true for contributors (for whom we just don't have anything to offer at the moment besides the code).

Regarding the roadmap, I totally understand this perspective, and we are not sharing enough of it. Partly this is because we are still doing a decent number of experiments, and partly it's because we have to learn what we're doing to be able to actually communicate it in some way that people can grasp.

The overarching goal of Hero (vs SecretAgent) is "how to make the experience of Hero/SecretAgent enjoyable and fast to write scrapers". We're doing this by working through a large project using the Hero fork of SecretAgent, where the development process is driven by a "Headed" chrome browser with tooling "attached on". For what it's worth, our core "promise" with SecretAgent was "can we create a headless browser that has a path to being undetectable out of the box." Longterm, SecretAgent will be replaced with Hero (all of it is split out into the Hero codebase and the parts of the Ulixee ecosystem).

We'll work on the conduct and guidelines. As Hero matures, we will get our Roadmap solidified and published, but I think in short-term, it's unfortunately unlikely to reach the top of our todo lists.

GlenDC commented 2 years ago

it's unfortunately unlikely to reach the top of our todo lists.

Believe me as someone who has been in OSS for a decenia already I do understand that sentence. It's however for that reason that it could be fruitful to enable contributors as soon as you can, now that the tech is mature enough that it becomes actually viable to use in production.

I tell this because:

Going to stop blabbing about this subject as I think you know all of this already and more, but did want to give my 2 cents on it for what it is worth, from a total stranger.


Core documentation is sparse mostly because we had assumed users would generally not use it

I can imagine that. I think however the client will always be a bit limiting for certain use cases of "power" users. And while you can make clear that this is not part of the API and it will break at any point due to changes, as long as that is warned I think it's still never the less fair to have people use it directly, so might as well be documented.

Again, I guess you might also be aware of that already, or perhaps you do not agree.


where the development process is driven by a "Headed" chrome browser with tooling "attached on".

Do i understand the following correctly:

Is that correctly understood by me?


but I think in short-term, it's unfortunately unlikely to reach the top of our todo lists.

In the short-term you might perhaps just put a very super vague high level version? Could be just a couple of lines long, or just 2 or 3 vague milestones of a couple of words each with some super vague none-promised deadlines?

blakebyrnes commented 2 years ago

Really good point about contributors. I think part of our issue is SecretAgent (as it stands) simply doesn't feel good enough. So we're looking at Hero to resolve some of those issues. It's hard to dedicate effort to making SecretAgent easier to contribute to when that feels true, and Hero is enough in flux that it feels tough to get help there too. In any case, your point is well taken, and we certainly have to get there at some point in the near term.

We're anticipating being at a point to make the switch in the next month or two. So hopefully this isn't too far off from sharing a lot of what's going on. Or maybe there's a way to work more in the open? Willing to consider ideas here..


Anything you can share about using Core directly that you'd like to do/use that client is limiting you with?



Where would you want to see that can of thing? Are you imagining things in the Discussion section of the site? Where would you have gone to look for these?

GlenDC commented 2 years ago

Or maybe there's a way to work more in the open? Willing to consider ideas here..

There are always ways I imagine, depends how open you want to be of course. No wrong answer there, more of a choice people make I suppose. Not a trade off I can make for you.

In general tracking all efforts, todos, bugs, features in progress, future features, anything really as GItHub issues and using that in combination with GitHub projects might already solve a lof of it :) Might also immediately resolve the roadmap issue.


It's just the development process that will be changing to headed by default.

Ok great to hear that headless will still be fully supported. But I can imagine that running in headed mode is probably indeed a lot easier as it immediately gives you a lot of stuff for free that you otherwise would have been limited by.

Currently the support for Docker users is also fairly limited. There are references to it, and some people in the community doing efforts, but nothing officially promoted by the maintainers of this project. I can imagine that going headfull by default might make this also more difficult. I mean I know you can get headfull to work in docker, but it's not the easiest thing to pull off. Given how containers are now a normal thing in many projects it might make sense to also support docker container use somehow "officially"?


Anything you can share about using Core directly that you'd like to do/use that client is limiting you with?

Nothing in specific. So far the plugins have been good enough, but given the client is another abstraction layer, it's not had to imagine that there are limits here.


Where would you want to see that can of thing? Are you imagining things in the Discussion section of the site? Where would you have gone to look for these?

If you talk about the roadmap. Personally I would check the readme or some kind of page (with header nav) on website. I guess from both locations make sense. But that's my perspective of course.

blakebyrnes commented 2 years ago

I really appreciate you taking the time to write in here and contribute your thoughts. We're going to move forward with some of your ideas, and we're very close on Hero. We'll share where we're at with that shortly and it will lead to announcing our broader roadmap. Per the smaller items, I like the ideas of using more of the features on the issue tracker here.