Set schema early - Githubissues

Description

Problem scenario:

I created datasets for each of our GH repo that track who gave it a "star"
I then combined them all into one "community" derivative dataset
Because some of our repos are mostly internal - they may not have any stars
This causes a problem - currently if the source of a root dataset does not produce any data - the SetDataSchema event is not written - and datasets without schema will block derivative transformations from executing

In this PR:

I change push/pull ingest to define schema as soon as possible, even if input data is empty using the read stage schema
I changed DF engine to produce Parquet files even if output DataFrame is empty - this way I can propagate output schema to kamu
kamu now handles engines propagating empty Parquet files to define derivative dataset schema, even if the dataset stays empty
I eliminate the hack where one of the old Parquet files was passed as a "Schema carrier" to the engine - instead kamu will always write an empty Parquet file using schema from SetDataSchema

[x] Unit and integration tests added
[x] Compatibility:
- [x] Network APIs: ✅
- [x] Workspace layout and metadata: ✅
- [x] Configuration: ✅
- [x] Container images: ❌ Requires new DF engine. I will consider updating Flink, Spark, RW too
[x] Documentation:
- [x] Changelog: ✅
- [x] Public documentation: ✅
[x] Downstream effects:
- [x] kamu-node: ✅