mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

Pass in the filename to the mapper script as an argument #60

Open jankatins opened 3 years ago

jankatins commented 3 years ago

It would be nice in certain debugging scenarios to actually know the filename (or any parts in that filename) in the mapper scripts (e.g. to write it to the row in the final table).

It would basically adding -- "{self.file_name}"in the mapper script part of the pipe:

https://github.com/mara/mara-pipelines/blob/master/mara_pipelines/commands/files.py#L65-L71


                f'{uncompressor(self.compression)} "{pathlib.Path(config.data_dir()) / self.file_name}" \\\n' \
                + (f'  | {shlex.quote(sys.executable)} "{self.mapper_file_path()}" -- "{self.file_name}" \\\n' # <- changed

As far as I understand, as current mappers do not get any args, none should fail if they get one now... @hz-lschick @martin-loetzsch ?

ghost commented 3 years ago

I currently don't use mapper scripts, so, feel free to change it 👍

ghost commented 3 years ago

P.S. I already found several parts where breaking changes should be done. E.g the tables are still called after the old repo name (data integration) and the cli call 'mara_pipelines.ui.run' IMHO should be 'mara_pipelines.run'. This could be all done in a new branch for version 4 to not make any breaking changes in version 3. Just an idea 💡. Would love a virtual community meet/call where such things could be discussed