Integration of Contraction Heirarchy code into pgrouting

woodbri commented 8 years ago

On 1/3/2016 8:19 AM, rohith reddy wrote:

Hey Steve, As you know, me and Mukul are working on contraction hierarchies for routing.I have been reading papers on contraction hierarchies for road networks. In most of them,the method of contraction is as follows,

1.Node ordering for contraction(based on some criterion). 2.Contracting the node with the highest priority(more likely to be contracted). 3.Introducing(upon decision) shortcuts between the nodes adjacent to this node.

Some of the node ordering techniques I found are,

Edge difference 2.Uniformity 3.Deleted Neighbors 4.Voronoi Regions
 During the last month we have worked on a contraction model,where 
the nodes are ordered according to their degrees,and then contracted.We have defined levels of contraction as follows: Level 0: Removal of all one degree nodes. Level 1:Removal of all two degree nodes. I implemented a cpp class which contracts the graph,given the level of contraction.I also implemented cpp classes for dijkstra and astar,which extend the above class,and run the algorithm on the contracted graph.These classes can be integrated with pgrouting.The process is as follows:

1.Contract the graph to the given level. 2.Run dijkstra/astar on the contracted graph. 3.Unpack the path,and derive the original path.

I have tested it for some test cases and it works fine.We actualIy are not sure of the exact contraction model,but inorder to test we have taken up the above contraction model.I am also planning to implement a query for the same in postgresql.Can you help me, how to proceed?

Regards, Rohith Reddy.

Hi Rohith,

This sounds like excellent work the you guys have been doing. Regarding integrating this into postgresql and want to bring Vicky into the discussion because she has reworked the C code that is need to some easy to use templates and she has done a lot of work in build reusable C++ classes for Dijkstra based on Boost Graph.

The integration into postgresql follows these steps:

define a plpgsql function(s) that is callable from SQL. This is a little be more difficult on the case of contraction hierarchies because we need to a) contract the graph and store it, b) recall the contracted graph and run query(s) against it. So we have to consider how and where to store the contracted graph.
given a plpgsql signature, we need to write a C function that processes the arguments of the function into a C structs and calls a C++ wrapper to your code.
we need a C++ wrapper, that handles taking the C structs passed to it, create an instance of you solver or contractor, and loads the data into the solver or contractor from the C structs, runs the solver or contractor, and finally returns the results back to the C code which in turn returns it back to the SQL code.

So you might have some hypothetical commands like:

insert into contracted_graphs
  select 'my_contracted_graph_name'::text, contracted_blob::bytea
    from pgr_contractgraph(
        sql_for_edges::text,
        sql_for_turn_restriction::text, -- if you support them
        contraction_level::integer default 0, -- or whatever
        ... -- other options or data you might need
    );

where sql_for_edges might look like:

'select edge_id, nodeid_a, nodeid_b, weight_forward, weight_reversed, ...
  from edge_table'

We might store contracted blobs in a table like:

create table contracted_graphs (
    name text,
    blob bytea
);

Issues to be sorted out:

what is the best way to store large blobs of data?
should we look at the contracted graph as a blob or is it better viewed as structured data for the purpose of storing it in the database.
would it be faster to store the data in file(s) on disk and only remember the filenames in the database?
if in files can multiple simultaneous queries access the files without issues?

Remember, multiple user database means NO globals. postgresql is a long running server process so NO memory leaks. etc.

The use here is performance of serializing the contracted graph and store it in the database and then later retrieving it and unserializing it. Postgresql has some performance issues around fetching and storing large blob objects that we would need to look at. Also I have not ever done this in the past so some investigation will need to be done.

For the actual route query, you should model the SQL functions and code to act like the new code Vicky has done for pgr_dijkstra() and I will let her add links to that below. But this case, rather than passing the edges in you would pass in the contracted graph, and the route query parameters and you would want to return results structured as that code does. Please look at her new code NOT the 2.0 code.

woodbri commented 8 years ago

This route query should work similar to the new pgr_dijkstra() and support queries like: one to one, one to many, many to one and many to many. Then this will integrate seemlessly with the existing code and users can switch back and forth between the existing code and your new code with minimal hassles.

woodbri commented 8 years ago

https://github.com/sankepallyrohithreddy/OSMContraction

woodbri commented 8 years ago

Looking through the repository, it looks likes you started with the idea of integrating the code into pgrouting which helps a lot.

Maybe you can explain or point to where the contracted graph definition is. Is it easy to store it in an edge table? or is it a binary opaque blob?

I think you would need to do some tests to figure out what is the best way to store the contracted graph so that it is fast to access it when you need to do a query.

My assumption is that contraction takes a long time so we want to save the contracted graph and reuse it. To this end, it would be great to compare:

how long does it take to contract the graph
how long does it take to save the contracted graph
how long does it take to restore the contracted graph
how long does it take to run a dijkstra or astar query on the contracted graph
we might want to make sure the we get the same results between contracted dijkstra and the current dijkstra on the uncontracted data and how long the uncontracted solver takes.

We probably want to plot out some curves at various graph sizes of 10, 100, 1,000, 10,000, 100,000 edges and evaluate this for different storage strategies for the contracted graph.

But, first it might help to have some any initial integration as a baseline for comparison. So a little more information on the structue of the contracted graph and what additional data you might need to save that goes with that.

Do you current write the contracted graph to a text or binary file and reload it? Where are the reader/writers for this?

woodbri commented 8 years ago

Also, noticed that you are doing some malloc() calls and using pointers. While this might be valid for some very specific cases, it is best if these all get converted to C++ standard containers and references.

For example where does this malloc() call get free()ed: https://github.com/sankepallyrohithreddy/OSMContraction/blob/master/code/src/mydijkstra/tester/test.cpp#L28

I see you are using containers in other places so that is very good.

woodbri commented 8 years ago

On 1/3/2016 2:49 PM, Rohith Reddy wrote:

Hey Steve, Thanks a lot for pointing out.I will look into the code and make necessary corrections.I have been generating different levels of contracted graphs using a separate cpp file https://github.com/sankepallyrohithreddy/OSMContraction/blob/master/code/src/contraction/src/tester/test.cpp

which reads/writes from/to a csv file for the purpose of visualization.I will implement a proper reader/writer function,which can be used for our analysis. The graphMinimizer.cpp class stores both the contracted version and the original version of the graph.The graphs are stored as adjacency lists(boost graph library) .Yes,since the contraction takes a lot of time,we need to reuse the contracted graph.So first we need to decide upon how to store the contracted graph.We will first analyse and compare both the storage procedures namely,edge table and the binary opaque blob.I will first try and implement the storage of the contracted graph in an edge table,and then evaluate its performance for various graph sizes.

cvvergara commented 8 years ago

@sankepallyrohithreddy Hi Rohith

Within the next 15 days I will be working on the new cmake. Once the cmake is ready I will move the code to the main repository, and work on the documentation of what is going to be 2.2. That is when I can create a branch, and we can start to integrate your code. would be nice if we can add it to the "proposed functions" documentation section for the 2.2. As a proposed function: functionality can change, signature can change, and it gives the opportunity to get more people involved on the development and testing.

While I get to move the code to the main repository, I will give you very specific tasks:

make a branch from master

   git checkout -b integratePgrouting
   git push

(the git push will give you the full instruction on how to push a branch for the first time, just follow it) the rest of the steps are on that branch

rename/move directories to match pgrouting's structure

   git mv code/sql test      <-- dump.sql looks more like a test
   git mv code/lib  sql       <-- here are your queries
   git mv code src

edit .gitignore

add the following lines:

   OSM
   qgis

Note: if you use the data from the (workshop 3.4)[http://workshop.pgrouting.org/chapters/installation.html#data](the one we get with BBOX) for testing it would be better, as people involved in developing have that data.

make sure everything works as it used to work.

Compile, do what you use to do to get results.

create the documentation

I would think that you have a clone of pgRouting somewhere so copy the documentation of a function that the signature "looks like" the signature you are using:

    mkdir doc
    cp /path/to/src/func_name/doc/file.rst  doc/contraction.rst
    git add doc/contraction.rst

Edit contraction.rst and put the documentation there.

Compile with this flags:

I am leaving this as the last task, as its very laborious task. Please use this flags, and make sure your code compiles with no warnings, this is our new development standard.

For C++:

-std=c++0x -fPIC -O2 -g -Wall -pedantic  -fmax-errors=10 -Wextra -frounding-math -Wno-deprecated

Remove all warnings.

For C: (this is informative, compile the C code as you use to, because we are going to change it when we glue it to pgRouting, so no need to bother checking it)

-std=gnu99 -fPIC -O2 -g -Wall  -pedantic   -fmax-errors=10 -Wmissing-prototypes -frounding-math

woodbri commented 8 years ago

I've put a lot of notes in the ticket: https://github.com/pgRouting/pgrouting/issues/440 Please start using the ticket so all ideas are captured in one place and other people can help also.

I think you can tackle this by working on two fronts:

document what you think the SQL command should look like for the contractor and for the query. Think about it in terms of the user, but also make sure that you are getting all the information that you need to successfully complete the request. If you contraction function is return a table explain the table structure. If fact, each command should explain the record structure in detail for the inputs and the outputs.
start investigating the performance issues of storing the contracted graph.

With respect to timing, pgrouting 2.2 is planned to be released this month (Jan) and if we can get this integrated it could be released with 2.2 as proposed functions.

rohithsankepally commented 8 years ago

Hey Steve, While looking for how to store the data in binary format I found this link ( https://wiki.postgresql.org/wiki/BinaryFilesInDB ) useful.After going through this I felt using "bytea" data type would help us. So we will be storing the contracted graph as a comma separated string with a delimiter. For example suppose we have a graph with the following edges upon contraction,

     id  | src | tgt | cost | rcost
   -----+-----+----+------+-------
      0 |    2 |    4 |    1  |  1
      1 |    5 |    8 |    1  |  -1
      2 |    6 |    5 |    1  |  1  
      3 |    3 |   -1 |    0  |  -1

We store the contracted graph as '0,2,4,1,1,$,1,5,8,1,-1,$,2,6,5,1,1,$,3,3,-1,0,-1' in the "bytea" format,where "$" is the delimiter.

We also need to reuse the information about the removed vertices and edges, so we should also store them inorder to unpack the path and return the original path to the user.I have used two datastructures for this purpose namely,

1.Psuedo edges: map< eid, pair<eid1,eid2> > where eid,eid1 and eid2 are the edge ids. -> eid is the edge id of the shortcut. -> eid1 and eid2 are the edge ids of removed edges.

2.Removed vertices: map<V,deque > where V is the vertex descriptor of the removed vertex,and this maps to the edges removed while removing this vertex.

How should I store the above data in order to reuse it ?

woodbri commented 8 years ago

OK, we need to store the contracted graph and the maps, so the pattern is to serialize the data and return it to the user and the user will store it in a table for later use. We might want to change this pattern in the future if there is a faster or better way to do this.

I think for the 1st pass, we use your idea of comma and $ delimited text is a good idea because we can read it or dump that into a file for debugging if we need to. Regarding the maps, you could change you format to something like:

graph:\n
0,2,4,1,1,$,1,5,8,1,-1,$,2,6,5,1,1,$,3,3,-1,0,-1\n
psuedo_edges:\n
eid,eid1,eid2$eid,eid1,eid2$...\n
removed_vertices:\n
...$....$....\n

or you could replace all the $ with '\n' so the data records are line based. Anyway, this should be easy, we will have to evaluate how fast it is for large graphs.

The trick will be how to do this efficiently memory wise, because you don't want to allocate this twice, once in C++ and then copy it to C to return it to the SQL. Don't worry about that now, we can optimize that later if you can get it to work in the first place.

rohithsankepally commented 8 years ago

Hey Steve, While writing code for contraction function I faced the following problems,

I need to return two data-types text(graph name) and bytea(edges) for the query .In the function, the datatype to be returned is a Datum,and I need to typecast both these data types(text,bytea) into a Datum.While I looked into the functions of different queries I found most of the types returned were integer and float which have functions for being typecasted into Datum like Int32GetDatum and Float8GetDatum respectively.How do I typecast text and bytea?

2.In the psuedo_edges datastructure(which I mentioned in the previous comment),I am just storing the edge ids,whose information is required while unpacking,and given an edge id,I need to fetch the information about the edge from the original table(since they are lost in contraction).Do I need to fetch the info again from the table (or) store the info(id,source,target,......) about the removed edges in another datastructure?

Please help me with this.

woodbri commented 8 years ago

regarding 1., unless you really need a bytea you could just return that data also as text and this will have the same effect and if it is just text then it can be human readable for debugging. late we might want to compress it using something like gzip then it will have to be a bytea. Here is a link to code that is returning set of records of text fields https://github.com/postgis/postgis/blob/svn-trunk/extensions/address_standardizer/address_standardizer.c

For 2, you have two options: A) store the data or B) require the user to pass in the original edges again. I think A is the better option because you KNOW the data is in sync with the contracted graph and in B it leaves open the possibility that the data passed in at query time is not the same as that passed in at contraction time. I presume that this data is the same as that stored in the two map containers you mentioned above. In the case, I would just append that data to edges item like I mentioned above:

0,2,4,1,1,$,1,5,8,1,-1,$,2,6,5,1,1,$,3,3,-1,0,-1\n
psuedo_edges:\n
eid,eid1,eid2$eid,eid1,eid2$...\n
removed_vertices:\n
...$....$....\n

rohithsankepally commented 8 years ago

Hey Steve, Thanks a lot.I am done with the function which inserts the contracted graph and its name into a table in text format.The query is as follows insert into contracted_graphs select 'my_contracted_graph_name'::text, contracted_blob::bytea from pgr_contractgraph( sql_for_edges::text, contraction_level::integer ); Should I store the datastructures needed for unpacking in the same blob(graph) (or) should I store them as different entries in the same table,like

Table: contracted_graph name(text) | blob(bytea) ----------------------------+---------------------------------------- contracted_graph | contracted_graph_blob removed_vertices | removed_vertices_blob psuedo_edges | psuedo_edges_blob

How should I proceed?

woodbri commented 8 years ago

I would store them as separate column in the same row like:

insert into contracted_graphs
select 'my_contracted_graph_name'::text as name, 
          contracted_blob::bytea, 
          removed_vertices::bytea, 
          psuedo_edges::bytea
from pgr_contractgraph(
    sql_for_edges::text,
    contraction_level::integer
);

this way you can later run a query like:

select * from pgr_dijkstra(
   'select * from contracted_graphs 
      where name=''my_contracted_graph_name'' ',
     ...
);

so each row in the table represents one complete contracted graph and you can store multiple contracted graphs in a single table.

rohithsankepally commented 8 years ago

Hey Steve,I am again facing a problem of typcasting a Datum to other types.While writing code for fetching column-wise data from the contracted_graphs table whose columns are of type 'text',I am using SPI_getbinval(......) function which fetches me the data of type Datum,which should be converted into a character array/text(depending on the column type) for processing.Is there a function that typecasts Datum to text ?

We may also require the typecast between Datum and bytea in the future if we store the data(columns) in bytea format. How should I proceed?

Thanks in advance.

cvvergara commented 8 years ago

@sankepallyrohithreddy Hello, You might want to look at any of the following files... (.c & .h) edges_input points_input restrictions input

because as I told you... the c part of the code of whatever you do will be rewritten when I integrate it into pgRouting. I would prefer that you focus on:

a version that is executing, even as a stand by program non glued to the database.
And write the documentation.

As a reminder I wrote in a previous comment:

create the documentation

I would think that you have a clone of pgRouting somewhere so copy the documentation of a function that the signature "looks like" the signature you are using:

    mkdir doc
    cp /path/to/src/func_name/doc/file.rst  doc/contraction.rst
    git add doc/contraction.rst

modify the file such that:

it describes your intended input & outputs of all the involved queries so that when I glue it to the database I can follow that description to create the appropriate code for the integration to pgrouting.

woodbri commented 8 years ago

Look at https://github.com/postgis/postgis/blob/svn-trunk/extensions/address_standardizer/address_standardizer.c again. It both takes text fields as input and returns text fields as output. and specifically notice the helper function text2char()

lextab = text2char(PG_GETARG_TEXT_P(0));

rohithsankepally commented 8 years ago

@cvvergara Hello, I updated the documentation and also implementation of the query pgr_contractgraph in the repository.Please have a look and give me your feedback.I am working on the documentation for shortest path query for the contracted version of the graph.I have a working program not glued to the database.

Please tell me what to do next?

Thanks in advance.

cvvergara commented 8 years ago

@sankepallyrohithreddy TODO list: V = vicky R= Rohith 0) V & R research & study: search for "git bring another repository into a branch" http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/

1) V = Create a branch based on dev-2.2 name contraction-2.2 2) R = update fork's clone after 1) is done

git fetch upstream
git checkout contraction-2.2

3) R & V comment and make a plan of action based on point 0

cvvergara commented 8 years ago

the branch has being created: https://github.com/pgRouting/pgrouting/tree/contraction-2.2

cvvergara commented 8 years ago

2.1) forgot push the branch contraction-2.2:

git push

cvvergara commented 8 years ago

ok.. based on this (link)[http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/]

Lets orginize all in one directory:

mkdir source
mkdir source/contraction
git mv src  source/contraction
git mv sql source/contraction
git mv test source/contraction
git mv qgis source/contraction
git mv osm source/contraction
git mv source src

So your directory now looks like:

src/
    contraction/
         src/
         sql/
         test/
         qgis/
         osm/

after you do the movements make sure that your code compiles

rohithsankepally commented 8 years ago

Hey Vicky,

So now I need to organize all the code into a directory in my repository and then bring that directory into the contraction-2.2(fork's clone) branch along with its history and make sure it compiles, right?

with MailTrack https://mailtrack.io/install?source=signature&lang=en&referral=rohithreddy2219@gmail.com&idSignature=22

On Tue, Jan 19, 2016 at 10:23 PM, Vicky Vergara notifications@github.com wrote:

ok.. based on this (link)[ http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/ ]

Lets orginize all in one directory:

mkdir source mkdir source/contraction git mv src source/contraction git mv sql source/contraction git mv test source/contraction git mv qgis source/contraction git mv osm source/contraction git mv source src

So your directory now looks like:

src/ contraction/ src/ sql/ test/ qgis/ osm/

after you do the movements make sure that your code compiles

— Reply to this email directly or view it on GitHub https://github.com/pgRouting/pgrouting/issues/440#issuecomment-172915592 .

cvvergara commented 8 years ago

yes, right

woodbri commented 8 years ago

A couple of thoughts on this. Most of the existing code is not very complex and it is easy to put it into a single 'src' directory, If this makes sense for your code. If you code needs to be organized into multiple directories you can do that under src/contraction/src/ directory. If you need help getting this to work under cmake @cvvergara or @woodbri can probably help you get that sorted out. If you are using Makefiles currently then just get it running with you Makefile in this directory.

cvvergara commented 8 years ago

he is using cmake.

cvvergara commented 8 years ago

@woodbri I think that he makes the structure suggested above which will fall naturally into src/contraction directory... then once its integrated we start using the redundant code sub-directories and connect to to the data base with the new functions.

rohithsankepally commented 8 years ago

The code is working fine when I run cmake in the newly added directory(src/contraction) in the contraction-2.2 branch.Shall I change my make files and make sure that my query works,when the whole project is built, i.e making it consistent with the other cmake files of the project.How should I proceed?

Thanks in advance.

Sent with MailTrack https://mailtrack.io/install?source=signature&lang=en&referral=rohithreddy2219@gmail.com&idSignature=22

On Wed, Jan 20, 2016 at 10:30 AM, Vicky Vergara notifications@github.com wrote:

@woodbri https://github.com/woodbri I think that he makes the structure suggested above which will fall naturally into src/contraction directory... then once its integrated we start using the redundant code sub-directories and connect to to the data base with the new functions.

— Reply to this email directly or view it on GitHub https://github.com/pgRouting/pgrouting/issues/440#issuecomment-173088555 .

cvvergara commented 8 years ago

I dont see that the history was preserved. didyou follow the instructions of this link? http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/

rohithsankepally commented 8 years ago

I can see the commit history of the directory(src/contraction) , in the commit history of contraction-2.2 branch of the forked repository to which the directory(src/contraction) was added.Can you please verify that once again?

https://github.com/sankepallyrohithreddy/pgrouting/commits/contraction-2.2?page=1

Thanks in advance.

cvvergara commented 8 years ago

@sankepallyrohithreddy Rohith: The most important part of the instructions is that you don't work on your clone. So... Lets do: 3) R & V comment and make a plan of action based on point 0 First of all, whatever you did didn't preserve the history of you work: for example: lots of commits on dec 12 VS nothing in the pgrouting branch for that date here Preserving the history is very important, as it honours your work. Then lets start all over... R: delete contracion-2.2 branch from pgRouting fork's clone locally and remotely.

git checkout develop
git branch -d  contracion-2.2
git push origin --delete  contracion-2.2

and lets recreate the branch from the upstream & push:

git fetch upstream
git branch -a         <-- use this to see that upstream/.../ contraction-2.2 exists
git checkout contraction-2.2
git push

when that is done please, tell me, I want to check that its exactly the same. (meanwhile I'll write another comment on what goes next)

cvvergara commented 8 years ago

I already did the merge, you actually didnt have much history of commits, so maybe what you did before was ok.

pgRouting / pgrouting