skinkie / reference

Personal repository where I collect working examples to understand inner workings while building PyNeTExConv
GNU Affero General Public License v3.0
1 stars 1 forks source link

Architecture for multiple files #22

Closed ue71603 closed 2 months ago

ue71603 commented 5 months ago

I think I need some progress for the Swiss files.

When do you think that you could propose a db based solution.

For the current Swiss files there could be an intermediate step:

Currently we have

conversion(time table file)
  load resource frame
  load site frame
  load service  frame
  load service calendar frame
  load time table file
  convert
  write output

I suggest to rewrite any-toepip.py for the time being to:

l``` load common frame and add it to aux_tre load resource frame and add it to aux_tre load site frame and add it to aux_tre load service frame and add it to aux_tre load service calendar frame and add it to aux_tre for each timetable file load time table file convert(aux_tree, timetable file)



The file names and directory I would put into a configuration. 
I also write it that the selectors always search both trees

Do you think the effort is worthwhile?
This would handle 
* single file
* complete line files
* commen files with time table files
skinkie commented 5 months ago

Your approach loading in a single tree, in order to have it available, this obviously makes sence for smarter access and limited code changes, but how the code now works is trying to minimize the in memory usage therefore it uses xpath to parse a file. I think it is a better approach to have the code in some way 'aware' what tree to use for which datatype. So I think the idea to make a network-delivery from a line-delivery is a sound but bloated way.

ue71603 commented 5 months ago

I am not sure, that this will help in some cases (e.g. for the Interchanges, when they are only in one tree). Also some transforms need to have a lot of the things in memory otherwise they can't optimise the output (e.g. reuse of AvailabilityConditions).

What could work would be a database / tree that stores: (class,id,version,file) But then one would need to optimise access by sorting them by file for each needed part) Algorithm:

  1. read the index tree from all files
  2. read indices for timetable/network to transform
  3. process for each needed element
    • get elements from trees by index (grouped by file) needed for conversion
    • transform
skinkie commented 2 months ago

netex-to-db.py has the ability to transparently process zip files, xml files, and gzipped xml files.