jvkersch / pyconcorde

Python wrapper around the Concorde TSP solver
BSD 3-Clause "New" or "Revised" License
347 stars 95 forks source link

Can I solve a TSP problem with this solver using a distance matrix instead of inputing the latitude and longitude nodes? #28

Closed hachinoone closed 3 years ago

hachinoone commented 3 years ago

Hi. I am working with a time dependent traveling salesman problem, where the distance between the nodes are changing over time. Since I did not find the APIs in the readme, could you please tell me how to solve a TSP problem using a distance matrix between nodes with this solver? Thanks.

jvkersch commented 3 years ago

For the moment, you will have to use the standalone version of Concorde. The Python bindings currently only support passing the coordinates of the nodes. I have some plans to improve the interface, but I am not sure when I can get round to fixing this.

jvkersch commented 3 years ago

@hachinoone If you're interested in the bleeding edge, I've just pushed a PR with a new API that will give you the functionality that you are looking for. However, this new API is radically different from the old one, so you may have to do some work.

  1. Download the Concorde executable from http://www.math.uwaterloo.ca/tsp/concorde/downloads/downloads.htm, make it executable, and place it on your path.
  2. Check out the pyconcorde-subprocess branch from this repo: git checkout pyconcorde-subprocess
  3. Reinstall the package: pip install -e . (from the root folder of the repo)
  4. Adapt the code from the example below to your situation:
from concorde import Problem, run_concorde

distances = # ...
problem = Problem.from_matrix(distances)
solution = run_concorde(problem)
print(solution.tour)

In the above, distances should be a square (symmetric) matrix of distances. You can either provide a NumPy array, or a list of lists.

I would be quite happy to hear about your experiences with this API. In the coming weeks, I plan to improve the API so that it is easier to use. Feedback is very welcome.

hachinoone commented 3 years ago

Well, I will have a try. Thanks a lot.

mikedbjones commented 1 year ago

This is a terrific tool. I used the pyconcorde-subprocess branch as you suggest above to solve for this 36x36 distance matrix:

array([[   0, 1192, 2166, ..., 1697, 2347, 1007],
       [1400,    0, 2645, ..., 1615, 1428, 1148],
       [2166, 2542,    0, ..., 4043, 3681, 1689],
       ...,
       [1697, 1666, 3835, ...,    0,  650, 2337],
       [2347, 1428, 3737, ...,  650,    0, 2575],
       [1048, 1045, 1689, ..., 2545, 2472,    0]])

This was the result:

concorde /tmp/tmpjiq1cb5j/problem.tsp
Host: xyz  Current process id: 16476
Using random seed 1686230350
Number of Nodes: 36
Explicit Lengths (CC_MATRIXNORM)
Set initial upperbound to 18566 (from tour)
  LP Value  1: 17687.750000  (0.00 seconds)
  LP Value  2: 18096.750000  (0.01 seconds)
  LP Value  3: 18421.000000  (0.02 seconds)
  LP Value  4: 18485.000000  (0.03 seconds)
  LP Value  5: 18540.400000  (0.04 seconds)
  LP Value  6: 18566.000000  (0.05 seconds)
New lower bound: 18566.000000
Final lower bound 18566.000000, upper bound 18566.000000
Exact lower bound: 18566.000000
DIFF: 0.000000
Final LP has 73 rows, 114 columns, 953 nonzeros
Optimal Solution: 18566.00
Number of bbnodes: 1
Total Running Time: 0.06 (seconds)
[0, 30, 12, 4, 13, 16, 2, 15, 11, 28, 19, 35, 29, 17, 26, 32, 14, 8, 1, 9, 20, 34, 7, 33, 18, 27, 22, 3, 10, 25, 6, 31, 24, 21, 5, 23]

May I ask if you're planning to release this updated API formally? Using a distance matrix was the only option for me, since I'm using walking durations between points using OSM data. I calculated my matrix using an OSM routing library. Lats/longs would not work, since the program has to take into account actual footpaths, tracks, roads etc, in calculating the distances. However, with your pyconcorde-subprocess I was able to do it, when following your instructions. I think the distance matrix input could be very helpful for people with similar projects to mine. This was my solution plotted on a map:

combs

One final note: for a while I was getting a solution of 0.00, which I knew couldn't be right. After experimentation I realised that the matrix had to contain only integer values. After rounding all my distances to the nearest integer, it worked.

jvkersch commented 1 year ago

Thanks @mikedbjones for the informative feedback and the nice visualization. I do want to update the code to support the subprocess API formally and your comment has given me the stimulus that I need to do this. Regarding your other comment, about integer distances, that is something that the API could warn about.

mikedbjones commented 1 year ago

@jvkersch , it looks like you've made the changes to the main branch. Thank you for this, I can now get pyconcorde to solve from a symmetric matrix. Would you mind if I make a PR to update the readme with this use case? Secondly, I have made a function in my own code to make an asymmetric matrix into a symmetric one; my understanding is that Concorde needs a symmetric problem? If you wish I could make a PR to bring this into the code?

def symmetricize(m, high_int=None):

    """ Jonker-Volgenant method for transforming asymmetric TSP -> symmetric """

    # if high_int not provided, make it equal to 10 times the max value:
    if high_int is None:
        high_int = round(10*m.max())

    m_bar = m.copy()
    np.fill_diagonal(m_bar, 0)
    u = np.matrix(np.ones(m.shape) * high_int)
    np.fill_diagonal(u, 0)
    m_symm_top = np.concatenate((u, np.transpose(m_bar)), axis=1)
    m_symm_bottom = np.concatenate((m_bar, u), axis=1)
    m_symm = np.concatenate((m_symm_top, m_symm_bottom), axis=0)

    return m_symm
jvkersch commented 1 year ago

@mikedbjones Both sound great, thanks for picking this up!

  1. I have indeed snuck a few pieces of the subprocess API into main, without impacting the existing functionality. If you could open a PR to update the README to describe this, that would be a very big help. You could add a new section, and you could describe the new functionality as experimental. To be clear, I want to remove the existing implementation (at which point the new implementation will no longer be experimental) but I don't know when that will happen.
  2. For the symmetrization, yes, please feel free to contribute this as a PR. Would you be able to add an example of a nonsymmetric travelling salesman problem, and a short description of how the algorithm addresses this?
bairuofei commented 1 year ago

@mikedbjones

@jvkersch , it looks like you've made the changes to the main branch. Thank you for this, I can now get pyconcorde to solve from a symmetric matrix. Would you mind if I make a PR to update the readme with this use case? Secondly, I have made a function in my own code to make an asymmetric matrix into a symmetric one; my understanding is that Concorde needs a symmetric problem? If you wish I could make a PR to bring this into the code?

def symmetricize(m, high_int=None):

    """ Jonker-Volgenant method for transforming asymmetric TSP -> symmetric """

    # if high_int not provided, make it equal to 10 times the max value:
    if high_int is None:
        high_int = round(10*m.max())

    m_bar = m.copy()
    np.fill_diagonal(m_bar, 0)
    u = np.matrix(np.ones(m.shape) * high_int)
    np.fill_diagonal(u, 0)
    m_symm_top = np.concatenate((u, np.transpose(m_bar)), axis=1)
    m_symm_bottom = np.concatenate((m_bar, u), axis=1)
    m_symm = np.concatenate((m_symm_top, m_symm_bottom), axis=0)

    return m_symm

Thank you for the implementation. May I know how to recover the original asymmetric tsp solution from the symmetric solution? Is there an existing API for that?

mikedbjones commented 1 year ago

Good question @bairuofei , I've submitted a pull request (currently open), which gets you back the solution.

Basically you pick alternate elements of the symmetric solution. Eg, if you had a 10 node asymmetric problem, with indices numbered [0, 1, ..., 9], the solution would be twice the size, something like [0, 10, 5, 15, 2, 12, ...]. Notice that the nodes alternate between the original nodes and the ghost nodes, and the ghost node is paired up with its associated original node (0 with 10, 5 with 15, etc). So, you pick the alternate elements [0, 5, 2, ...].

In my pull request #59 you'll see the code for this. Also see an article I recently wrote about geographic asymmetric TSP which explains the logic behind picking alternate elements of the solution: Solving Geographic Travelling Salesman Problems.

bairuofei commented 1 year ago

@mikedbjones Thank you for the prompt reply. I have tried your implementation and found a problem. I use it to solve an open tsp problem, i.e., the path starts at vertex 0 and traverses all remaining vertices without going back to vertex 0. To achieve this, I set the first column of the distance matrix to 0. The following code gives a simple example.

import numpy as np
from concorde import Problem, run_concorde

def symmetricize(matrix, k=None):
    """
        Jonker-Volgenant method of transforming (n x n) asymmetric TSP, C into (2n x 2n) symmetric TSP, S.

        Let C be an asymmetric TSP matrix.
        Let k be a very large number, ie k >> C.max()
        Let U = (u_ij) with u_ii = 0, u_ij = k for i != j.

        Construct the (2n x 2n) matrix:

                    +-------+
                    | U |C^T|
            S =     |---+---|
                    | C | U |
                    +-------+

        S is a symmetric TSP problem on 2n nodes.
        There is a one-to-one correspondence between solutions of S and solutions of C.
    """
    # if k not provided, make it equal to 10 times the max value:
    if k is None:
        k = round(10*matrix.max())

    matrix_bar = matrix.copy()
    np.fill_diagonal(matrix_bar, 0)
    u = np.matrix(np.ones(matrix.shape).astype(int) * k)
    np.fill_diagonal(u, 0)
    matrix_symm_top = np.concatenate((u, np.transpose(matrix_bar)), axis=1)
    matrix_symm_bottom = np.concatenate((matrix_bar, u), axis=1)
    matrix_symm = np.concatenate((matrix_symm_top, matrix_symm_bottom), axis=0)

    return matrix_symm

if __name__ == "__main__":

    distance1 = np.array([[0, 2, 4], 
                        [0, 0, 3],
                        [0, 3, 0]])
    #              0
    #            /   \
    #          (2)    (4)    
    #         /         \
    #        1  - (3) -  2

    distance2 = np.array([[0, 5, 4], 
                        [0, 0, 3],
                        [0, 3, 0]])
    #              0
    #            /   \
    #          (5)    (4)    
    #         /         \
    #        1  - (3) -  2

    symm_distance1 = symmetricize(distance1)
    problem1 = Problem.from_matrix(symm_distance1)
    solution1 = run_concorde(problem1)
    print(solution1.tour)     # [0, 5, 2, 4, 1, 3]

    symm_distance2 = symmetricize(distance2)
    problem2 = Problem.from_matrix(symm_distance2)
    solution2 = run_concorde(problem2)
    print(solution2.tour)    # [0, 3, 2, 5, 1, 4]

Clearly, solution 2 follows your reply and we can find the shortest path is [0, 2, 1]. However, solution 1 is supposed to be [0, 1, 2] but it still returns [0, 2, 1]. Also, solution 1 seems does not follow the rule of [a_1, a_1+n, a_2, a_2+n, ...]. Can I ask you where I went wrong? Thank you.

mikedbjones commented 1 year ago

@bairuofei instead of setting the column to zero, try setting it to a very high value (apart from 0 in position 0, 0). This should place a very high cost on returning to node zero.

distance1 = np.array([[0, 2, 4], 
                      [99, 0, 3],
                      [99, 3, 0]])

Can you try this? Apologies, I'm not on my Linux machine at the moment to try it myself.

bairuofei commented 1 year ago

Only solution 2 is correct. The cost of solution 1 seems also correct: 104 = 2 +3 + 99, meaning the path is [0 1 2 0]. But the returned path of solution 1 is still confusing.

# Solution 1
Optimal Solution: 104.00
Total Running Time: 0.00 (seconds)
[0, 5, 2, 4, 1, 3]

# Solution 2
Optimal Solution: 106.00
Total Running Time: 0.00 (seconds)
[0, 3, 2, 5, 1, 4]
mikedbjones commented 1 year ago

Try setting k=99 in symmetricize(). In the absence of this, it will set it to 10 times the largest value (so 990 if we used 99 to avoid going back to zero). This may cause Concorde to prefer going back to zero. I would need to give some thought as to whether this would guarantee the right answer in all cases of 'open TSP', and the code I wrote wasn't designed to handle such problems. Does that fix it?

bairuofei commented 1 year ago

I found that maybe the problem is caused by my definition of the distance matrix. In my implementation, each row represents its distance to all other vertices. For example,

distance1 = np.array([[0, 2, 4], 
                      [99, 0, 3],
                      [99, 3, 0]])

The first row means the distance from vertex-0 to vertex-1 is 2, from vertex-0 to vertex-2 is 4. While in your blog, the first row represents the distances from all other vertices to vertex-0. So I just transpose the distance1 as follows:

  distance1 = np.array([[0, 99, 99], 
                      [2, 0, 3],
                      [4, 3, 0]])

Now the solution is:

Optimal Solution: 104.00
Total Running Time: 0.00 (seconds)
[0, 4, 1, 5, 2, 3]

It is [0 1 2], which I think is the correct answer.

Regarding another problem about the rule [a_1, a_1+n, a_2, a_2+n, ...], from your blog, I find that maybe the rule is not correct. It is true that the real vertex is paired with the ghost vertex in symmetric solution, but not necessary be a_i , a_i+n, it may be a_i, a_j + n. For example, in the following image from your blog, it is possible that A is paired with C', rather than only with A'. Is this correct? Seems not correct. 1

mikedbjones commented 1 year ago

Several things here:

  1. In the blog, arrays are defined the same way you have. So the first row represents distances FROM the first node
  2. It is true that there are 2 possible solution forms: i. [i1, i1 + n, i2, i2 + n, i3, i3 + n, ..., in, in + n ] ii. [i1, i2 + n, i2, i3 + n, i3, ..., in + n, in, i1 + n ]
  3. In the image you've shown, A is not connected to C', only C' to A (see the small arrows)
  4. See above my reply about setting k which you may have missed
bairuofei commented 1 year ago

It works. After setting k = 99 and keeping the original distance matrix unchanged, the code outputs the correct answer. May I know why k = 99 works and 990 does not work? :smile:

# Solution 1
Optimal Solution: 104.00
Total Running Time: 0.00 (seconds)
[0, 3, 1, 4, 2, 5]

# Solution 2
Optimal Solution: 106.00
Total Running Time: 0.00 (seconds)
[0, 3, 2, 5, 1, 4]
mikedbjones commented 1 year ago

Because, in the absence of setting k explicitly, behind the scenes, symmetricize is creating a matrix with values of 990. This is making the algorithm treat the value of 99 that you put into your matrix comparatively attractive. Print symm_distance1 to see what I mean, with and without setting k.

mikedbjones commented 1 year ago

@mikedbjones Both sound great, thanks for picking this up!

  1. I have indeed snuck a few pieces of the subprocess API into main, without impacting the existing functionality. If you could open a PR to update the README to describe this, that would be a very big help. You could add a new section, and you could describe the new functionality as experimental. To be clear, I want to remove the existing implementation (at which point the new implementation will no longer be experimental) but I don't know when that will happen.
  2. For the symmetrization, yes, please feel free to contribute this as a PR. Would you be able to add an example of a nonsymmetric travelling salesman problem, and a short description of how the algorithm addresses this?

Hi @jvkersch , anything I can do to help with the PR for the symmetrization?

SomyaNIGAM commented 2 months ago

I am trying to do the same thing for my project. Could you please share how to run the solver on distance matrix?