Closed woodbri closed 7 years ago
This route query should work similar to the new pgr_dijkstra() and support queries like: one to one, one to many, many to one and many to many. Then this will integrate seemlessly with the existing code and users can switch back and forth between the existing code and your new code with minimal hassles.
Looking through the repository, it looks likes you started with the idea of integrating the code into pgrouting which helps a lot.
Maybe you can explain or point to where the contracted graph definition is. Is it easy to store it in an edge table? or is it a binary opaque blob?
I think you would need to do some tests to figure out what is the best way to store the contracted graph so that it is fast to access it when you need to do a query.
My assumption is that contraction takes a long time so we want to save the contracted graph and reuse it. To this end, it would be great to compare:
We probably want to plot out some curves at various graph sizes of 10, 100, 1,000, 10,000, 100,000 edges and evaluate this for different storage strategies for the contracted graph.
But, first it might help to have some any initial integration as a baseline for comparison. So a little more information on the structue of the contracted graph and what additional data you might need to save that goes with that.
Do you current write the contracted graph to a text or binary file and reload it? Where are the reader/writers for this?
Also, noticed that you are doing some malloc() calls and using pointers. While this might be valid for some very specific cases, it is best if these all get converted to C++ standard containers and references.
For example where does this malloc() call get free()ed: https://github.com/sankepallyrohithreddy/OSMContraction/blob/master/code/src/mydijkstra/tester/test.cpp#L28
I see you are using containers in other places so that is very good.
On 1/3/2016 2:49 PM, Rohith Reddy wrote:
Hey Steve, Thanks a lot for pointing out.I will look into the code and make necessary corrections.I have been generating different levels of contracted graphs using a separate cpp file https://github.com/sankepallyrohithreddy/OSMContraction/blob/master/code/src/contraction/src/tester/test.cpp
which reads/writes from/to a csv file for the purpose of visualization.I will implement a proper reader/writer function,which can be used for our analysis. The graphMinimizer.cpp class stores both the contracted version and the original version of the graph.The graphs are stored as adjacency lists(boost graph library) .Yes,since the contraction takes a lot of time,we need to reuse the contracted graph.So first we need to decide upon how to store the contracted graph.We will first analyse and compare both the storage procedures namely,edge table and the binary opaque blob.I will first try and implement the storage of the contracted graph in an edge table,and then evaluate its performance for various graph sizes.
@sankepallyrohithreddy Hi Rohith
Within the next 15 days I will be working on the new cmake. Once the cmake is ready I will move the code to the main repository, and work on the documentation of what is going to be 2.2. That is when I can create a branch, and we can start to integrate your code. would be nice if we can add it to the "proposed functions" documentation section for the 2.2. As a proposed function: functionality can change, signature can change, and it gives the opportunity to get more people involved on the development and testing.
While I get to move the code to the main repository, I will give you very specific tasks:
git checkout -b integratePgrouting
git push
(the git push will give you the full instruction on how to push a branch for the first time, just follow it) the rest of the steps are on that branch
git mv code/sql test <-- dump.sql looks more like a test
git mv code/lib sql <-- here are your queries
git mv code src
add the following lines:
OSM
qgis
Note: if you use the data from the (workshop 3.4)[http://workshop.pgrouting.org/chapters/installation.html#data](the one we get with BBOX) for testing it would be better, as people involved in developing have that data.
Compile, do what you use to do to get results.
I would think that you have a clone of pgRouting somewhere so copy the documentation of a function that the signature "looks like" the signature you are using:
mkdir doc
cp /path/to/src/func_name/doc/file.rst doc/contraction.rst
git add doc/contraction.rst
Edit contraction.rst
and put the documentation there.
I am leaving this as the last task, as its very laborious task. Please use this flags, and make sure your code compiles with no warnings, this is our new development standard.
-std=c++0x -fPIC -O2 -g -Wall -pedantic -fmax-errors=10 -Wextra -frounding-math -Wno-deprecated
Remove all warnings.
-std=gnu99 -fPIC -O2 -g -Wall -pedantic -fmax-errors=10 -Wmissing-prototypes -frounding-math
I've put a lot of notes in the ticket: https://github.com/pgRouting/pgrouting/issues/440 Please start using the ticket so all ideas are captured in one place and other people can help also.
I think you can tackle this by working on two fronts:
With respect to timing, pgrouting 2.2 is planned to be released this month (Jan) and if we can get this integrated it could be released with 2.2 as proposed functions.
Hey Steve, While looking for how to store the data in binary format I found this link ( https://wiki.postgresql.org/wiki/BinaryFilesInDB ) useful.After going through this I felt using "bytea" data type would help us. So we will be storing the contracted graph as a comma separated string with a delimiter. For example suppose we have a graph with the following edges upon contraction,
id | src | tgt | cost | rcost
-----+-----+----+------+-------
0 | 2 | 4 | 1 | 1
1 | 5 | 8 | 1 | -1
2 | 6 | 5 | 1 | 1
3 | 3 | -1 | 0 | -1
We store the contracted graph as '0,2,4,1,1,$,1,5,8,1,-1,$,2,6,5,1,1,$,3,3,-1,0,-1' in the "bytea" format,where "$" is the delimiter.
We also need to reuse the information about the removed vertices and edges, so we should also store them inorder to unpack the path and return the original path to the user.I have used two datastructures for this purpose namely,
1.Psuedo edges: map< eid, pair<eid1,eid2> > where eid,eid1 and eid2 are the edge ids. -> eid is the edge id of the shortcut. -> eid1 and eid2 are the edge ids of removed edges.
2.Removed vertices: map<V,deque
How should I store the above data in order to reuse it ?
OK, we need to store the contracted graph and the maps, so the pattern is to serialize the data and return it to the user and the user will store it in a table for later use. We might want to change this pattern in the future if there is a faster or better way to do this.
I think for the 1st pass, we use your idea of comma and $ delimited text is a good idea because we can read it or dump that into a file for debugging if we need to. Regarding the maps, you could change you format to something like:
graph:\n
0,2,4,1,1,$,1,5,8,1,-1,$,2,6,5,1,1,$,3,3,-1,0,-1\n
psuedo_edges:\n
eid,eid1,eid2$eid,eid1,eid2$...\n
removed_vertices:\n
...$....$....\n
or you could replace all the $ with '\n' so the data records are line based. Anyway, this should be easy, we will have to evaluate how fast it is for large graphs.
The trick will be how to do this efficiently memory wise, because you don't want to allocate this twice, once in C++ and then copy it to C to return it to the SQL. Don't worry about that now, we can optimize that later if you can get it to work in the first place.
Hey Steve, While writing code for contraction function I faced the following problems,
2.In the psuedo_edges datastructure(which I mentioned in the previous comment),I am just storing the edge ids,whose information is required while unpacking,and given an edge id,I need to fetch the information about the edge from the original table(since they are lost in contraction).Do I need to fetch the info again from the table (or) store the info(id,source,target,......) about the removed edges in another datastructure?
Please help me with this.
regarding 1., unless you really need a bytea you could just return that data also as text and this will have the same effect and if it is just text then it can be human readable for debugging. late we might want to compress it using something like gzip then it will have to be a bytea. Here is a link to code that is returning set of records of text fields https://github.com/postgis/postgis/blob/svn-trunk/extensions/address_standardizer/address_standardizer.c
For 2, you have two options: A) store the data or B) require the user to pass in the original edges again. I think A is the better option because you KNOW the data is in sync with the contracted graph and in B it leaves open the possibility that the data passed in at query time is not the same as that passed in at contraction time. I presume that this data is the same as that stored in the two map containers you mentioned above. In the case, I would just append that data to edges item like I mentioned above:
0,2,4,1,1,$,1,5,8,1,-1,$,2,6,5,1,1,$,3,3,-1,0,-1\n
psuedo_edges:\n
eid,eid1,eid2$eid,eid1,eid2$...\n
removed_vertices:\n
...$....$....\n
Hey Steve, Thanks a lot.I am done with the function which inserts the contracted graph and its name into a table in text format.The query is as follows insert into contracted_graphs select 'my_contracted_graph_name'::text, contracted_blob::bytea from pgr_contractgraph( sql_for_edges::text, contraction_level::integer ); Should I store the datastructures needed for unpacking in the same blob(graph) (or) should I store them as different entries in the same table,like
Table: contracted_graph name(text) | blob(bytea) ----------------------------+---------------------------------------- contracted_graph | contracted_graph_blob removed_vertices | removed_vertices_blob psuedo_edges | psuedo_edges_blob
How should I proceed?
I would store them as separate column in the same row like:
insert into contracted_graphs
select 'my_contracted_graph_name'::text as name,
contracted_blob::bytea,
removed_vertices::bytea,
psuedo_edges::bytea
from pgr_contractgraph(
sql_for_edges::text,
contraction_level::integer
);
this way you can later run a query like:
select * from pgr_dijkstra(
'select * from contracted_graphs
where name=''my_contracted_graph_name'' ',
...
);
so each row in the table represents one complete contracted graph and you can store multiple contracted graphs in a single table.
Hey Steve,I am again facing a problem of typcasting a Datum to other types.While writing code for fetching column-wise data from the contracted_graphs table whose columns are of type 'text',I am using SPI_getbinval(......) function which fetches me the data of type Datum,which should be converted into a character array/text(depending on the column type) for processing.Is there a function that typecasts Datum to text ?
We may also require the typecast between Datum and bytea in the future if we store the data(columns) in bytea format. How should I proceed?
Thanks in advance.
@sankepallyrohithreddy Hello, You might want to look at any of the following files... (.c & .h) edges_input points_input restrictions input
because as I told you... the c part of the code of whatever you do will be rewritten when I integrate it into pgRouting. I would prefer that you focus on:
As a reminder I wrote in a previous comment:
I would think that you have a clone of pgRouting somewhere so copy the documentation of a function that the signature "looks like" the signature you are using:
mkdir doc
cp /path/to/src/func_name/doc/file.rst doc/contraction.rst
git add doc/contraction.rst
modify the file such that:
Look at https://github.com/postgis/postgis/blob/svn-trunk/extensions/address_standardizer/address_standardizer.c again. It both takes text fields as input and returns text fields as output. and specifically notice the helper function text2char()
lextab = text2char(PG_GETARG_TEXT_P(0));
@cvvergara Hello, I updated the documentation and also implementation of the query pgr_contractgraph in the repository.Please have a look and give me your feedback.I am working on the documentation for shortest path query for the contracted version of the graph.I have a working program not glued to the database.
Please tell me what to do next?
Thanks in advance.
@sankepallyrohithreddy TODO list: V = vicky R= Rohith 0) V & R research & study: search for "git bring another repository into a branch" http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/
1) V = Create a branch based on dev-2.2 name contraction-2.2 2) R = update fork's clone after 1) is done
git fetch upstream
git checkout contraction-2.2
3) R & V comment and make a plan of action based on point 0
the branch has being created: https://github.com/pgRouting/pgrouting/tree/contraction-2.2
2.1) forgot push the branch contraction-2.2:
git push
ok.. based on this (link)[http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/]
Lets orginize all in one directory:
mkdir source
mkdir source/contraction
git mv src source/contraction
git mv sql source/contraction
git mv test source/contraction
git mv qgis source/contraction
git mv osm source/contraction
git mv source src
So your directory now looks like:
src/
contraction/
src/
sql/
test/
qgis/
osm/
after you do the movements make sure that your code compiles
Hey Vicky,
So now I need to organize all the code into a directory in my repository and then bring that directory into the contraction-2.2(fork's clone) branch along with its history and make sure it compiles, right?
with MailTrack https://mailtrack.io/install?source=signature&lang=en&referral=rohithreddy2219@gmail.com&idSignature=22
On Tue, Jan 19, 2016 at 10:23 PM, Vicky Vergara notifications@github.com wrote:
ok.. based on this (link)[ http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/ ]
Lets orginize all in one directory:
mkdir source mkdir source/contraction git mv src source/contraction git mv sql source/contraction git mv test source/contraction git mv qgis source/contraction git mv osm source/contraction git mv source src
So your directory now looks like:
src/ contraction/ src/ sql/ test/ qgis/ osm/
after you do the movements make sure that your code compiles
— Reply to this email directly or view it on GitHub https://github.com/pgRouting/pgrouting/issues/440#issuecomment-172915592 .
yes, right
A couple of thoughts on this. Most of the existing code is not very complex and it is easy to put it into a single 'src' directory, If this makes sense for your code. If you code needs to be organized into multiple directories you can do that under src/contraction/src/
directory. If you need help getting this to work under cmake @cvvergara or @woodbri can probably help you get that sorted out. If you are using Makefiles currently then just get it running with you Makefile in this directory.
he is using cmake.
@woodbri I think that he makes the structure suggested above which will fall naturally into src/contraction directory... then once its integrated we start using the redundant code sub-directories and connect to to the data base with the new functions.
The code is working fine when I run cmake in the newly added directory(src/contraction) in the contraction-2.2 branch.Shall I change my make files and make sure that my query works,when the whole project is built, i.e making it consistent with the other cmake files of the project.How should I proceed?
Thanks in advance.
Sent with MailTrack https://mailtrack.io/install?source=signature&lang=en&referral=rohithreddy2219@gmail.com&idSignature=22
On Wed, Jan 20, 2016 at 10:30 AM, Vicky Vergara notifications@github.com wrote:
@woodbri https://github.com/woodbri I think that he makes the structure suggested above which will fall naturally into src/contraction directory... then once its integrated we start using the redundant code sub-directories and connect to to the data base with the new functions.
— Reply to this email directly or view it on GitHub https://github.com/pgRouting/pgrouting/issues/440#issuecomment-173088555 .
I dont see that the history was preserved. didyou follow the instructions of this link? http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/
I can see the commit history of the directory(src/contraction) , in the commit history of contraction-2.2 branch of the forked repository to which the directory(src/contraction) was added.Can you please verify that once again?
https://github.com/sankepallyrohithreddy/pgrouting/commits/contraction-2.2?page=1
Thanks in advance.
@sankepallyrohithreddy Rohith: The most important part of the instructions is that you don't work on your clone. So... Lets do: 3) R & V comment and make a plan of action based on point 0 First of all, whatever you did didn't preserve the history of you work: for example: lots of commits on dec 12 VS nothing in the pgrouting branch for that date here Preserving the history is very important, as it honours your work. Then lets start all over... R: delete contracion-2.2 branch from pgRouting fork's clone locally and remotely.
git checkout develop
git branch -d contracion-2.2
git push origin --delete contracion-2.2
and lets recreate the branch from the upstream & push:
git fetch upstream
git branch -a <-- use this to see that upstream/.../ contraction-2.2 exists
git checkout contraction-2.2
git push
when that is done please, tell me, I want to check that its exactly the same. (meanwhile I'll write another comment on what goes next)
I already did the merge, you actually didnt have much history of commits, so maybe what you did before was ok.
On 1/3/2016 8:19 AM, rohith reddy wrote:
Hi Rohith,
This sounds like excellent work the you guys have been doing. Regarding integrating this into postgresql and want to bring Vicky into the discussion because she has reworked the C code that is need to some easy to use templates and she has done a lot of work in build reusable C++ classes for Dijkstra based on Boost Graph.
The integration into postgresql follows these steps:
So you might have some hypothetical commands like:
where sql_for_edges might look like:
We might store contracted blobs in a table like:
Issues to be sorted out:
Remember, multiple user database means NO globals. postgresql is a long running server process so NO memory leaks. etc.
The use here is performance of serializing the contracted graph and store it in the database and then later retrieving it and unserializing it. Postgresql has some performance issues around fetching and storing large blob objects that we would need to look at. Also I have not ever done this in the past so some investigation will need to be done.
For the actual route query, you should model the SQL functions and code to act like the new code Vicky has done for pgr_dijkstra() and I will let her add links to that below. But this case, rather than passing the edges in you would pass in the contracted graph, and the route query parameters and you would want to return results structured as that code does. Please look at her new code NOT the 2.0 code.