qri-io / walk

Webcrawler/sitemapper
GNU General Public License v3.0
6 stars 2 forks source link

Define this project's objective & first milestone #6

Closed b5 closed 5 years ago

b5 commented 5 years ago

This is now the third direct iteration of this project. First it was sentry, then pieces of it were extracted into a sitemap CLI tool, and now it's here. There have also been lots of related projects along the way.

Pitfalls of past iterations of this project:

I think this is a sign that we've needed this infrastructure for some time, but have struggled to clearly define a manageable scope with clear milestones that keeps the project moving. Part of it is this kind of infrastructure is tough to build, and even harder to get right. The other part is until now many of our projects have been in a formative stage, and we couldn't get to this work until we had a clearer understanding of what we wanted.

I think we should define some scope & goals now. As a group we tend to get pretty bogged down in process, so I think we should do two things to make our lives easier:

By only requiring those two things to keep moving, but allowing for much more info to enter the conversation as food-for-thought. I think we can get to rough consensus and working code faster by just focusing on those two things.


Pyramid of Clarity

@joehand put me on to the pyramid of clarity a few weeks ago, and I must say it's proven a very effective tool for prioritizing work by connecting it to a mission. I'm on a personal mission to have all github issues I create & work on connect back to a repo's objective, and use how closely an issue matches the objective as the primary method of prioritizing work.

I'm not suggesting we need to follow this "pyramid of clarity" process, instead I think the pyramid shows the work we don't have to do, because it should be done elsewhere. We've stated from the get go that this project will support multiple organizations, which are in charge of defining their own missions. To me this means don't have to define a mission (that should be left to the various orgs), but instead we just need an objective for this repo.


I'd love to toss out a draft objective & milestone to work with:

Draft Objective:

Define & maintain common, modular crawling infrastructure that multiple projects can depend on and share the results of

Draft Near-Term Milestone:

Use walk to generate a sitemap:

These are meant to spur discussion 😄


Action Items:

Mr0grog commented 5 years ago

I think we should define some scope & goals now… I'm on a personal mission to have all github issues I create & work on connect back to a repo's objective

❤️

We've stated from the get go that this project will support multiple organizations, which are in charge of defining their own missions. To me this means don't have to define a mission (that should be left to the various orgs), but instead we just need an objective for this repo.

I think it might be helpful to lay out what these orgs want to get out of this. That’s definitely more narrow than a high level objective, but I think we can (should?) drive upwards from these needs to a broader objective (then we can use the pyramid to drive down from that objective). At the very least, it’d be good to just have them articulated somewhere :)

I can speak for web monitoring:

In Slack @edsu and I had talked about the value of having a service like this for docnow and diffengine. If I understood his needs well:

Draft Objective:

Define & maintain common, modular crawling infrastructure that multiple projects can depend on and share the results of.

I like this. 👍

Is there something worth adding in here about conceptualizing this as running infrastructure, not just code? (Is that a critical part of this project or is it something higher level, e.g. a goal specific to/owned by EDGI?)

Draft Near-Term Milestone:

Use walk to generate a sitemap…

  • The person doing the deployment & the main code contributors should be different people, and only work through publicly-available documentation

I am worried this is not super feasible with the people-power we have, but it does sound like a nice way to validate how workable the software is. How can we make sure this happens?

b5 commented 5 years ago

Is there something worth adding in here about conceptualizing this as running infrastructure, not just code?

I just opened a PR that specifies EDGI intends to run this as a service, which to me maps nicely. I think it's important that this software be positioned to be easily turned into a service (with proper documentation on how to do that, links to the EDGI service when it's up, maintained docker images, etc).

re near-term milestone:

I am worried this is not super feasible with the people-power we have

Agreed, we should make this easier for ourselves. Like have me write the PR & you make sure you/others can generate a sitemap locally with config I haven't tested (ideally in a containerized env, but happy to punt on that). So we should probably not name this "deployment", instead some sort of "peer-tested" verbage. In this case it'll at least put a burden to kick off usage documentation.

Mr0grog commented 5 years ago

have me write the PR & you make sure you/others can generate a sitemap locally with config I haven't tested (ideally in a containerized env, but happy to punt on that).

I wouldn’t worry about containerization too much until we have workers that you configure and start completely separately from the coordinator. But then 100% yes.

Mr0grog commented 5 years ago

(Note: starting separately from coordinator can probably be pushed off a few weeks; even when we have Node.js workers, the coordinator could spawn them as subprocesses for now. Potential bonus points in that setup is that it makes a small-scale, non-distributed version easy to run.)

b5 commented 5 years ago

Ok, this feels like a great start. An initial milestone is defined, and we have a PR with initial objectives open. While I would welcome others to chime in on this, I'd like to keep the ball moving. So, I vote we merge #10 and close this issue, with the explicit note that If you'd like to weigh in on the current objective / roadmap, you can at any point, and we'll revisit

This way we keep moving, but make it clear that others should feel free to jump in. Sound ok?

Mr0grog commented 5 years ago

👍