Picking columns to export CSV

kaanolgu commented 7 months ago

Fixes #252

For babelstream we need multiple columns to be exported so with this change, the post-processing script would be able to recognise new column data alongside with x_axis and y_axis. If needed user could define a_axis, b_axis, etc. to export more columns and the default execution flow is not affected.
Since the plotting part is not modified, it would not intervene the plotting part

For example, with this change we would be able to export a new column for dataframe :


x_axis:
value: "tags"
units:
custom: "tags"
y_axis:
value: "Triad_value"
units:
column: "Triad_unit"
z_axis:
value: "spack_spec"
units:
column: null

series: [["partition", "cascadelake"],["partition", "volta"]]


- Then it would print ( commenting line `self.plot_generic( config["title"], df[columns][mask], config["x_axis"], config["y_axis"], series_filters)` [thanks @pineapple-cat ] ) :

Selected dataframe: tags Triad_value Triad_unit spack_spec partition 0 acc 12963.337 MBytes/sec babelstream%gcc@13.1.0 +acc cascadelake 1 acc 13386.104 MBytes/sec babelstream%gcc@9.2.0 +acc cuda_arch=70 volta 2 cuda 846640.634 MBytes/sec babelstream%gcc@9.2.0 +cuda cuda_arch=70 volta 3 omp 159131.926 MBytes/sec babelstream%gcc@13.1.0 +omp cascadelake 4 tbb 99740.216 MBytes/sec babelstream%gcc@13.1.0 +tbb partitioner=auto cascadelake


 Is there any alternative method to export multiple columns that I might be missing ? Open to any feedback or suggestions 

 Thank you!

kaanolgu commented 7 months ago

https://github.com/ukri-excalibur/excalibur-tests/commit/438be0257c39cf81bb1260e2a881559c2c10df3c

Made a change according to idea from @ilectra and @pineapple-cat . Instead of adding a new "*_axis" to the yaml file, a new list of columns to extract to csv dataframe is used and df_csv_export is generated from the original dataframe.

One question would be is it required to apply user-specified types to all relevant columns for the csv export too ?

ilectra commented 7 months ago

I think that, instead of creating a whole new df to export to csv, it would make more sense if you

add only the extra columns that you need in the csv_export part of the yaml, not all the axis and series etc.
keep those columns in the filtered dataframe, as you go along the processing/filtering, treating them as extra axes.
export the filtered df to csv, just before calling the plotting script (you won't call it, in your case, but that's the place to do the printing)

kaanolgu commented 7 months ago

I think that, instead of creating a whole new df to export to csv, it would make more sense if you

add only the extra columns that you need in the csv_export part of the yaml, not all the axis and series etc.

keep those columns in the filtered dataframe, as you go along the processing/filtering, treating them as extra axes.

export the filtered df to csv, just before calling the plotting script (you won't call it, in your case, but that's the place to do the printing)

That's a great idea! I will try to implement this in a new commit

ilectra commented 7 months ago

I think that, instead of creating a whole new df to export to csv, it would make more sense if you

add only the extra columns that you need in the csv_export part of the yaml, not all the axis and series etc.

keep those columns in the filtered dataframe, as you go along the processing/filtering, treating them as extra axes.

export the filtered df to csv, just before calling the plotting script (you won't call it, in your case, but that's the place to do the printing)

That's a great idea! I will try to implement this in a new commit

@kaanolgu you might want to wait for the refactoring PR to be merged. It will change things quite a bit, hopefully to the better!

ilectra commented 4 months ago

@pineapple-cat @kaanolgu I think I addressed all my review comments, please have a last look and merge if happy!

ukri-excalibur / excalibur-tests

Picking columns to export CSV #260