Issues with coordinates < 1

legohyl commented 3 years ago

Hello there, was wondering if anyone faces a similar issue whereby if the x and y coordinates given to the solver are >=0 and <=1, the returned optimal value is always 0.

jvkersch commented 3 years ago

Hmm, I have not encountered this, can you post an example problem where this manifests itself?

legohyl commented 3 years ago

Sure. Here's a code snippet that I was using. There is an optimal tour, but optimal value == 0 in this case.

import networkx as nx
from concorde.tsp import TSPSolver
import numpy as np

# Random geometric graph in a unit cube range [0, 1]
g = nx.random_geometric_graph(20, radius=10)

def distance(x, y):
    return np.sqrt((x[0] - y[0]) ** 2 + (x[1] - y[1]) ** 2)

# I added edge weights to show that the traversal around this graph should be non-negative, all edge weights are positive
node_dict = nx.get_node_attributes(g, 'pos')
for e in g.edges():
    g.edges[e]['weight'] = distance(node_dict[e[0]], node_dict[e[1]])    

# Construct (x, y) coordinate pair for solver call
pos_dict = nx.get_node_attributes(g, 'pos')
x = []
y = []
for key, value in pos_dict.items():
    x.append(value[0])
    y.append(value[1])

solver = TSPSolver.from_data(x, y, norm='EUC_2D')
soln = solver.solve()
print(soln.tour)

print(soln.optimal_value)

legohyl commented 3 years ago

Bumping on this issue, is it because of the type of norm I'm using? I haven't tried any of the others, but I assumed that EUC_2D was correct since I generated this via a unit cube in Euclidean space.

ben-hudson commented 3 years ago

I think it may be related to some rounding occurring somewhere.

This demonstrates the problem:

import numpy as np
from concorde.tsp import TSPSolver

norm = 'EUC_2D'

n = np.random.random((10, 2))
solver = TSPSolver.from_data(n[:,0], n[:,1], norm=norm)
solution = solver.solve()
a = solution.optimal_value

m = 1000*n
solver = TSPSolver.from_data(m[:,0], m[:,1], norm=norm)
solution = solver.solve()

b = solution.optimal_value

print(a, b)

Very different answers are returned even though the nodes are only scaled. In my case, I got 0.0 and 2966.0.

Using GEO instead of EUC_2D gives a better result, but still incorrect.

ben-hudson commented 3 years ago

~~The same thing happens when solving the instances with LKH-3, so I'd say its something to do with the TSPLIB format.~~

Distances are represented by integers in the TSPLIB format

Every instance I've seen (e.g. CVRP) has integer node coordinates.

legohyl commented 3 years ago

Thanks for spotting that! From that documentation, seems like they have an R implementation of precision, that can control the number of decimal places required. If that's the case, does that there's a parameter we can change when calling the C-library of concorde to tinker with the decimal place?

Scaling the nodes to get an integral coordinate is fine, but the problem is also the returned distance; if it's integral, it always gets truncated unnecessarily.

ben-hudson commented 3 years ago

I think the precision is specific to the R implementation, so the files that actually get written are integer-valued (i.e. when you load the file you need to remember what precision you wrote it with). Concorde reads the TSPLIB files, so it isn't aware of the precision and the truncation will always happen.

legohyl commented 3 years ago

That's really odd though, if you look at the Concorde implementation itself, their variables are of type double, which suggests that they aren't just considering integer types, else they'd have likely type-casted them to int64 instead. Are we missing some call from this API?

chkwon commented 3 years ago

According to TSPLIB-DOC, all distance functions return integer values. Coordinates can be floating, but distances are integral.

legohyl commented 3 years ago

Thanks for checking that out! So it's weird then, if I input floating point coordinates and receive a distance of 0.

chkwon commented 3 years ago

@legohyl When you input floating point coordinates, if the distance between two point is less than 0.5, then the distance is computed as zero. So, it is possible the shortest TSP tour can be of zero length.

If your coordinates are pretty small numbers (say less than 100) then multiply some number, and then after solving it, divide the distance by the same number.

ben-hudson commented 3 years ago

Maybe they do this because rounding the distance to an integer makes the problem NP-complete (https://en.wikipedia.org/wiki/Travelling_salesman_problem#Euclidean) 🤷

jvkersch / pyconcorde

Issues with coordinates < 1 #29