opendataam / opendatam-tasks

Public tasks for volunteers, hackathons and contests
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

[EN] Parse the data on public transport in Yerevan #29

Open ansakoy opened 1 year ago

ansakoy commented 1 year ago

Goal

The primary goal is to write a reusable parser to collect the data on public transport routs. It would be also nice to have an example of a resulting dataset for a particular date.

Tasks

This website http://marshrut.info/ presents data on routes and schedules of buses and trolleybuses in Yerevan. Write a reusable parser in your preferred language to grab the data from the website and pack them onto a nice machine-readable structure to store in a format, such as JSON or XML. These two would be preferable, because the data are likely to require a hierarchical structure, for instance:

[
  ...
  {
    "vehicle": STRING,  // specify if it is a bus or a trolleybus
    "number": STRING,  // the route's "number"; in fact, it should be a string in case it's alphanumeric
    "interval": NUMBER, // how often the vehicle arrives
    "measure": STRING, // looks like it's all in minutes, but just in case specify for each entry
    "stops_forward": ARRAY, // make it an array of stop names (strings)
    "stops_backward": ARRAY, // make it an array of stop names (strings) in the reverse order if the route back is the same or store the specific back route stops in this array
  }
  ...
]

This is just an example of a possible structure. If you can think of something more convenient, you're most welcome to implement it.

The key idea of such a parser is to make it as reusable and maintainable as possible. Schedules change quite often, so it would be great to be able to run this script at least on a daily basis to collect the actual data.

It would be also nice of course to have an example output of these data as a dataset for a particular date.

The website is in Armenian only, but in fact its structure is rather clear and simple, so if you don't know the language, it shouldn't be a problem. If you still run into language troubles that you cannot solve even with the help of Google Translate, please don't hesitate to contact us.

Context

The data presented at http://marshrut.info/ have a huge potential. They could be used in very helpful web and mobile apps to build optimal routes and predict arrival times, especially if combined with some spacial data. Unfortunately, they are not published as an API, so the first step to make use of these data is to parse the HTML pages.

Requirements

A public GitHub repository should be created to store and publish the code and possibly the data under one of the free and open licenses, such as Creative Commons or MIT. Please make the code as reusable and maintainable as possible and provide it with some instructions and requirements.

Wishes

It would be best if you also comment your code, so that even beginners can understand what it does.

Resources

http://marshrut.info/

Prepared by

The Open Data Armenia team prepared this task.

vlivyur commented 8 months ago

But sources of data for the website are pdfs from https://www.yerevan.am/hy/route-network/

ivbeg commented 8 months ago

@vlivyur if this original source is more precise, than it would be great to have this data too:)