opensanctions / crawler-planning

Task tracking for the crawlers we're working on
https://github.com/orgs/opensanctions/projects/2
4 stars 0 forks source link

Welcome to the OpenSanctions crawler planning repository!

This is where we coordinate implementing crawlers - the open source code which takes data from various sources, converts it to our data model, and adds it to our database for use in screening and investigations.

We have a contracted team who helps us keep on top of demand and our data coverage goals. If you're interested in joining, let us know!

We also welcome volunteer open source contributions, but recommend that you first take a look at the project board and chat with us to discuss the most appropriate next crawler to add. We'll then handle the movements on the project board described below on your behalf.

New data source suggestions

Suggestions for data sources can be added by submitting an issue in this repository. But please have a quick little search to see if someone else has already suggested the same source.

We review submissions and queue them for addition if/when they meet our data inclusion criterea.

Getting started on the team

We will assign your first task on the project board which we'll pick to be a nice introductory task.

Your next step is to dive into the instructions to get started with zavod, our ETL framework, and see if you can get a crawler working.

Workflow

The Todo column is ordered in decreasing priority.

When you're ready to start your next task, take the next top card in the Todo column and move it to the In Progress column. Add yourself as an assignee, if we haven't already done so.

When you feel your new/modified crawler is ready for production,

If it's ready

If some changes are needed, we'll comment on the pull request, and move it back to to the In Progress column for you to revise.

Please try to finish any work in progress before starting anything new.

Kinds of crawlers, sources, or data

Crawlers generally bring one type of target into our database, and tasks are annotated to indicate the type:

Each kind of target should be annotated according to our data model.