yakra / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
0 stars 0 forks source link

Region & System vertex population #265

Closed yakra closed 6 months ago

yakra commented 9 months ago

Using region/system TMBitsets for subgraph membership will require a separate pass through those objects to allocate space after the number of vertices is known. It could happen as early as here, then proceeding to populate as we do now, with HGVertex construction calling add_vertex (which would just need to do a TMBitset insertion instead of a vector emplacement). ...But that's dumb. If we shrink_to_fit, there'd still need to be another pass after population. Better yet...

yakra commented 9 months ago
/*static*/ void HighwaySystem::VertexThread(unsigned int id, std::mutex* mtx)
{   //printf("Starting HighwaySystem::VertexThread %02i\n", id); fflush(stdout);
    while (it != syslist.end())
    {   for (mtx->lock(); it != syslist.end(); it++)
          if (it->is_subgraph_system) break;
        if (it == syslist.end()) return mtx->unlock();
        HighwaySystem& h = *it++;
        mtx->unlock();

        //std::cout << '.' << std::flush;
        for (Route& r : h.routes)           // Yes, a system can get the same vertex multiple times
          for (Waypoint& w : r.points)          // from different routes, or loop endpoints.
            h.vertices.push_back(w.hashpoint()->vertex);// But HighwayGraph::write_subgraphs_tmg gets rid
    }                           // of any redundancy when making the final set.
}
yakra commented 7 months ago

@jteresco, I came across something interesting when working on this.

Here's the relevant part of a work-in-progress Region::VertexThread:

for (Route* r : rg.routes)              // Yes, a region can get the same
  for (Waypoint& w : r->points)             // vertex multiple times from
    if (w.is_or_colocated_with_active_or_preview()) // different routes, or loop endpoints.
    {   Waypoint* p = w.hashpoint();            // But HighwayGraph::write_subgraphs_tmg gets rid
    rg.vertices.emplace_back(p->vertex, p);     // of any redundancy when making the final set.
    }

It produces the same* graphs as existing siteupdate. (*meaning, same vertices & edges, but in a different order. Run them thru canonicaltmg and there's no diff.)

Now compare what I did originally:

for (Route* r : rg.routes)          // Yes, a region can get the same
  if (r->system->active_or_preview())       // vertex multiple times from
    for (Waypoint& w : r->points)       // different routes, or loop endpoints.
    {   Waypoint* p = w.hashpoint();        // But HighwayGraph::write_subgraphs_tmg gets rid
    rg.vertices.emplace_back(p->vertex, p); // of any redundancy when making the final set.
    }

This does the active/preview check at the Route level, without having to check every single Waypoint.

Diffs (HwyData @ 1f85330): Worldwide, the bottom version omitted 5 points in 4 graphs in 3 regions. Check`em out in the HDX.

These vertices were excluded because there's no active/preview Waypoint in the relevant region itself.

They "do and do not" belong in these regions. I can see arguments for or against including them.

Looking at this another way... Here's part of the HGVertex constructor in the current production code: https://github.com/yakra/DataProcessing/blob/fd96ac64ba2eff23600e86df8dcc84e94cb0a916/siteupdate/cplusplus/classes/GraphGeneration/HGVertex.cpp#L30-L35 Looks like siteupdate.py originally behaved the same way -- the a/p check wasn't done, and region graphs included vertices with only devel points there.

If this were instead

for (Waypoint *w : *(wpt->colocated))
{   // will consider hidden iff all colocated waypoints are hidden
    if (!w->is_hidden) visibility = 2;
    if (w->route->system->active_or_preview())
    { w->route->region->add_vertex(this, wpt);// Yes, a region/system can get the same vertex multiple times
      w->route->system->add_vertex(this);     // from different routes. But HighwayGraph::write_subgraphs_tmg
    }                     // gets rid of any redundancy when making the final set.
}

...or if the original siteupdate.py had had an equivalent check, we'd be getting the same results observed above.

Thoughts?

Any opinion on whether these points should be included or excluded?

The more I think about it, the more I lean toward exclusion. We should be consistent in the "active/preview only" approach, and treat devel systems like they don't exist at all. Oh, and simpler more efficient code is a plus too! ;)

Canonical Waypoint Name simplification is something to consider too -- "M3@COD/ZMB" indicates just one route (as there was only 1 route in the a/p coloc list). And M3 only exists in ZMB, not COD. If the codn devel system didn't exist, M3@COD/ZMB would exist, with the same name, in ZMB only.

yakra commented 7 months ago

For curiosity's sake, here's an alternate "check every point" solution that inlines the functionality of is_or_colocated_with_active_or_preview() and hashpoint(), using one colocated check that'd otherwise be done for each. Also saves the conditional on the is_or_coloc... call itself.

for (Route* r : rg.routes)
  for (Waypoint& w : r->points)
    if (w.colocated)
    {   for (Waypoint *c : *w.colocated)
      if (c->route->system->active_or_preview())
      { Waypoint* p = w.colocated->front();
        rg.vertices.emplace_back(p->vertex, p);
        break;
      }
    }
    else if (w.active_or_preview())
    rg.vertices.emplace_back(w.vertex, &w);

In reality, there's little to no point in using this. Better to just go whole hog & either...

yakra commented 6 months ago

Another reason I think omitting the vertices is justified: In graphs restricted by system, edge names only include the route names from the relevant systems. This is similar in that info not relevant to the graph's restrictions is omitted.

Let's just not think too hard about the counterexample, vertex unique_names. ;) These are the same across the board, in tm-master & all subgraphs, regardless of any restrictions by region or system. Doing otherwise would require a lot of extra computation for little questionable gain. :)