Open justb4 opened 6 years ago
It should be mentioned that this issue is already worked on/merged via PR #244 and #245 by @stvno on a separate restructure repo Branch.
Stetl (master/latest) ondersteunt nu multiple -a
opties. Zie voorbeeld gebruik in top10nl (README): https://github.com/nlextract/NLExtract/tree/master/brt/top10nl/etl . Tevens filenamen gestandaardiseerd, default.args
(allowed nu in .gitignore
maar niet andere .args
bestanden) heeft alle default args, eigen .args
hoeft alleen wijzigingen daarop te bevatten bijv alleen DB credentials.
Currently each (Stetl-based) ETL process like Top10nl, BRK, BGT etc has its own config/execution mode etc. At the same time all are very similar. Also for a user it is hard to grasp how to perform a specific ETL. This also makes Dockerization harder to develop.
The following needs/can be done to restructure the repo and its (Stetl-based) ETL processes:
brt/top10, brk/dkk
. Call each a "Project" (or "Process")brt/top10/etl/config/default.cfg
, gfs files etc.nlextract.sh
(ornlextract.py
maybe to be cross platform?)argument-file
and a possibly host-named args file.Something like
For Stetl an issue has been opened to allow multiple
-a
args.Only problem is how to deal with the BAG, which is not Stetl-based and has more extended commandline options. Possibly the default "convert to PostGIS" can be performed by
nlextract.sh|py
.