help2022-ukr / help2022

Website to help refugees from Ukraine (and everything related).
GNU General Public License v2.0
2 stars 0 forks source link

Scraping content #20

Open afinika opened 2 years ago

afinika commented 2 years ago

help post most updated content related to questions

TODO

ursueugen commented 2 years ago

Working on sources: https://docs.google.com/document/d/18HDUtf0eUi5f8j-t7N5gLOftoFqK_F-jk_txDEegZio/edit?usp=sharing

Important sources: (1) Telegram channel Prima sursa appears to be the most dynamic official source, but not all notifications are translated to RU -> could extract and translate to RU (2) Web platform UARefugees presents all kind of non-gov support, including shelter and transport.

Comments by questions of interest:

(1) Borders

(2) Asylum

(3) Gov announcements Prima sursa appears to be the primary channel for gov announcements. AP: already covered.

What do you think? @afinika

afinika commented 2 years ago

@ursueugen Ok so in chats we have 3 questions that are asked more often

  1. Shelter now! Safe!
  2. Where and how to cross the border? I don/t have documents what do I do? How long do I need to wait?
  3. What countries offer asylum for Ukrainians and how to apply for them, where, what documents what process?

we solved shelter manually for Moldova and Romania, there are more openings, is hard to track manually what we already have and what not. The focus is on gov shelters because are safer and those cannot say no. In other countries, I did not find the gov shelter location, places, and phone numbers only private, maybe hotels.

Now the focus is crossing the border problem.

Let's focus on migration info here #18 is the ref of the implementation please coordinate the content scraping with @Nikro

Nikro commented 2 years ago

@ursueugen - Hey, any progress on those 2 sources?

I did a sample output - https://docs.google.com/spreadsheets/d/17xGhHoFh6oN3vMaAGTsve6mb69cL0PlchzF2nfzXr1c/edit?usp=sharing - we will use same thing for bulk imports.

We'll also need to massage the data to fit these criteria (i.e. sanitize the name so we have only first name, or remove the numbers from street addresses).

As we're crawling potentially sensitive data and we don't want that to fall in wrong hands, I've created a new repo where I invited our dev-team (private one).

Nikro commented 2 years ago

@mdiannna - welcome!

You can use same sample output from above, but as source: https://www.shelter4ua.com/ua

I will send you in private the JSON I found, we'll probably only need to massage the data and export it to CSV.