Closed alexprengere closed 11 years ago
I just added the possibility to specify external fields on get
calls from CLI, using the syntax field:external_field
.
This makes the following commands possible:
$ GeoBase PAR -s tvl_por_list tvl_por_list:name
tvl_por_list ('BVA', 'CDG', 'JDP', ...
tvl_por_list:name (('Beauvais-Tilles',), ('Paris Charles de Gaulle', ...
More interesting, it is now possible to combine this with the previous explanation above on how-to-make-CLI-join-clauses with the syntax header{join_base:join_field}
, thus allowing to very easily extract data from external bases. Example with one-column file:
$ cat data.csv
ORY
CDG
$ cat data.csv | GeoBase -i " " 'origin{ori_por:iata_code}' -s origin origin:name
origin CDG ORY
origin:name ('Paris - Charles-de-Gaulle',) ('Paris-Orly',)
This is available on the develop
branch or on the last GeoBasesDev package.
I pushed some commits (mainly 2064d8d02c3fc814c53ee91c2d9a8e216c0b762f) in the develop
branch. These ones handle the join visualization on a map. If the GeoBase object has no geocode support, it tries to look on any join field if there is geocode support, then perform a cartesian product of the different joined values.
Long story short, this can draw lines:
$ cat data.csv
NCE PAR
LYS BOD
$ cat data.csv | GeoBase -i " " "origin{ori_por:iata_code}/destin{ori_por:iata_code}" --map
The UI is still a bit beta-i, but the main idea is out there.
I have been working on join clauses for a few days, and I think this is coming to an end. Here is some information about it.
Join can be specified in the configuration like this (example from
ori_por
):As you can see, a join is a list of mappings with two keys:
fields
: a field or an iterable of fields, indicating which field(s) are concernedwith
: a couple[base, join_fields]
, wherejoin_fields
can be one field or an iterable of fieldsA join is possible on the same base, and on multiple fields at once. It is even compatible with subdelimiters. The Python API has been slightly changed to seemlessly integrate this new notion to the
get
method:In the last example, we get a tuple
('France',)
because we look for any match on thecountry_code
'FR' in the basecountries
, and there could be several.What happens with subdelimiters?
The join has been performed on every subdelimited value, thus the tuple of tuple structure. I will not detail the multiple-fields join, it works the same way as above, except matching is done on several fields at ones. Bonus [tricky]: if join is made on several fields and several of them have subdelimiters, the cartesian product of all possible values from the different subdelimited fields is made.
The CLI integration has been made, you can specify a join clause next to the header names with
-i
. For now this is useless (except for debugging), since you cannot specify external fields onget
calls from CLI.One-column-example file reading from stdin:
I plan to modify the map visualization to integrate these changes. The goal is to have some kind of cleverness when a data has no geocode, but has some fields which are joined on other bases who do, and the visualization should adapt to those objects.
For example, the previous shell command displayed with
--map
will not display anything today. But sinceORY
andCDG
are joined toori_por
we could get their geocodes there, and draw the object in a specific way, depending on the topology (indeed when you perform the join you may get tuples of geocodes on different fields).Please keep in mind that these changes are on the
develop
branch, and I may change everything twice before it's released, so do not rely on those examples for production stuff.