Closed samgoeta closed 4 years ago
Fondements juridiques sur l'extraction de ces données personnelles : https://teamopendata.org/t/guide-commun-cnil-et-cada-open-data-rgpd/1320 https://teamopendata.org/t/decret-sur-les-documents-administratifs-pouvant-etre-rendus-publics-sans-faire-lobjet-dun-processus-danonymisation/963
Résultats du hackathon
{"_id":"prada4","startUrl":["https://datactivist.coop/scrap-prada/index_prada.html"],"selectors":[{"id":"type","type":"SelectorLink","parentSelectors":["_root"],"selector":"a","multiple":true,"delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["type","pagination"],"selector":"li.pager-item:nth-of-type(n+2) a","multiple":true,"delay":0},{"id":"link","type":"SelectorLink","parentSelectors":["type","pagination"],"selector":".title a","multiple":true,"delay":0},{"id":"institution","type":"SelectorText","parentSelectors":["link"],"selector":"h1","multiple":true,"regex":"","delay":0},{"id":"pr-nom-prada","type":"SelectorText","parentSelectors":["link"],"selector":".field-name-field-pr-nom-prada div.field-item","multiple":true,"regex":"","delay":0},{"id":"adresse","type":"SelectorText","parentSelectors":["link"],"selector":".field-item p","multiple":true,"regex":"","delay":0}]}
Scraping en cours via Google Spreadsheet (oui je sais c'est pas libre mais ça fait la job rapidement) : https://docs.google.com/spreadsheets/d/1QrAsL6RUrAc_KmF-Sw1uwmc8Zus8XnAMZzzj6Iqiu9k/edit#gid=0
Salut Sam, tu aurais à minima la liste des PRADA et contact email pour les ministères, autorités indépendantes, institutions et juridictions (une centraine d'administration) ?
Cf https://github.com/okfnfr/dada-france-authorities
La plupart proposent un formulaire de contact à la place de l'email dans l'annuaire de l'administration.
Quand c'était disponible, j'ai ajouté le contact email presse mais mieux si on a directement l'email du PRADA.
Merci !
On Thu, Jul 25, 2019 at 12:31 PM samgoeta notifications@github.com wrote:
Scraping en cours via Google Spreadsheet (oui je sais c'est pas libre mais ça fait la job rapidement) : https://docs.google.com/spreadsheets/d/1QrAsL6RUrAc_KmF-Sw1uwmc8Zus8XnAMZzzj6Iqiu9k/edit#gid=0
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/okfnfr/dada-france-theme/issues/14?email_source=notifications&email_token=AASEVRZT7WBOWGZYQRIQFRLQBF6J3A5CNFSM4IGEDMUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2ZCLWA#issuecomment-514991576, or mute the thread https://github.com/notifications/unsubscribe-auth/AASEVRZKM6HU6COFBEMV6YDQBF6J3ANCNFSM4IGEDMUA .
Aucun problème mais pour ça il me faudrait les noms de domaine, tu les as ?
Oui, il y a les urls des sites dans l'annuaire. À supposer que les noms de domaine des emails sont similaires.
On Mon, Jul 29, 2019 at 12:22 PM samgoeta notifications@github.com wrote:
Aucun problème mais pour ça il me faudrait les noms de domaine, tu les as ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/okfnfr/dada-france-theme/issues/14?email_source=notifications&email_token=AASEVR3D7L2A4GEC7IKOFCDQB3AFFA5CNFSM4IGEDMUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3AI2PY#issuecomment-515935551, or mute the thread https://github.com/notifications/unsubscribe-auth/AASEVRZ6I55RAVW55TQ7NETQB3AFFANCNFSM4IGEDMUA .
Scraper la liste des PRADA issues du site de la CADA https://www.cada.fr/administration/personnes-responsables