Include sand fraction table in report

z3tt commented 2 months ago

Currently, the sand fraction table is generated with a separate script and also relies on external inputs (i.e. it doesn't use information from the database).

It would be good to include it in the general report workflow so it is added to both the HTML and PDF version. Also, I think it would be convenient if the data is sourced from the database (but don't know if that makes sense? @torv72).

I assume that the sand fraction table is not going to be included in all reports, so we'd need another Yes/No argument in the generate_report()function OR decide that based on the database information.

Notes from issue #16 on the styling of the table:

include some in-line visualizations: add sparklines with values for first and last and/or min and max values
apply color encoding (or other emphasis) in case a value does not meet USGA specifications

torv72 commented 2 months ago

The data can be sourced from the MASTER DATABASE as I have been putting it in there.

You are correct. The sand fraction table will not always be included. I think we can add a yes/no argument.

If we did not want to go with a table, I did find a graphic that I wanted to show you for inspiration the other day but could not find it. Here it is. This is for the 0-2 cm. We could add two more (2-4 cm) and (4-6 cm) and stack them on one page?

z3tt commented 2 months ago

That looks cool! Always better to use a chart than a table and for the HTML version the detailed measurements would be available via tooltips. for the PDF version, we might consider adding the table below the graphics.

If I understood correctly, the data is not part of the database? What would be your preferred workflow? A manual csv table that gets sourced? Or would it be possible/make sense to add that to the database as well?

torv72 commented 2 months ago

I agree!

Regarding my current workflow with this. I receive the excel file from the laboratory with the results. Then I bring that into the MASTER DATABASE. My R skills are not good enough where I could utilize the MASTER DATABASE and utilize the data from it. So I created a CSV file that the script I created uses. I manually input the laboratory results each time I get them in order to create the table.

So to answer your question. The data is actually in the MASTER DATABASE right now. The laboratory test type is S022. My preferred workflow would be to only utilize the MASTER DATABASE just like we currently do. Include a yes/no argument to include it in the report. It would be its own separate page within the report. I think it would also have its own "header" in the Executive Summary where I could add comments. This would only be generated if the argument is "yes".

I like the idea of adding a table below the graphics too!

z3tt commented 2 months ago

Merging this with issue #16.

z3tt commented 5 days ago

The table is now included in both versions of the reports. The data is sourced from the MASTER DATABASE, and the recommendations are based on the examples you provided. It's now featured on a dedicated page in the PDF version after the OM results. I hope the title is fine; if I understood you correctly, it is part of the "Total Organic Matter" section and thus can use the same green color.

Please note that currently, the particle sizes in the database are not aggregated and formatted in the same way as in your isolated "fraction.csv".

Screenshot 2024-09-03 at 11 22 11

For instance, "Fine Gravel + Very Coarse Sand (2.0 mm + 1.0 mm)" is a single aggregated category in the "fraction.csv" file but appears as two separate entries in the database. Additionally, particle sizes in the database have a different format and style (e.g., "Fine Gravel - 2 mm (%)" in the database versus "Fine Gravel (2.0 mm)" in your example). To merge the data with the USGA recommendations effectively, we need consistent naming conventions. For now, I am adjusting the names by removing the "(%)" and wrapping the size into parentheses instead of using a hyphen, but it would be ideal to have the correct names directly in the database.

@torv72 Please update the database with the names and particle groups you would like to use.

I have removed the "% Retained" header line as in this version there's only a single year to show*. In general, I'd advise against making this an additional line but would suggest to add that to the overall title, i.e. as "OM 246 Sand Fractions (%Retained)". We could also consider to add the percentage sign to all values in the respective columns.

* EDIT: We also should think about which / how many years to show. Does it make sense to display all years, assuming that at some point you have quite some long-term data? We likely run out of space in the PDF version when using more than 3-5 years anyway.

For both reports, I will include visualizations similar to the one you've shared in your previous comment, with the table displayed below the visualization (and hidden within a collapsible toggle in the HTML version).

torv72 commented 5 days ago

Yes. I have included the Sand Fractions as part of the Organic Matter section. This is fine.

"Fine Gravel + Very Coarse Sand (2.0 mm + 1.0 mm)" is a single aggregated category in the "fraction.csv" file but appears as two separate entries in the database. You are correct. The laboratory data that I import into the MASTERDATABASE does not have this variable. I had to create this in the fraction.csv. It is one of the variables for the USGA. Can it be coded with the two added together to create that "object"? I believe there are two others that I did that for as well. Or I could manually create in the MASTERDATABASE and do the addition myself.

The three new "objects" would be:

The correct names are actually in the MASTERDATABASE. I changed the names in the fraction.csv because I didn't know quite how to format things.

I'm fine with your suggestions for title, etc. Whatever makes sense and looks good.

I'm fine with limiting the years to what fits just like we did in the Organic Matter table. Do we specify a start year and end year? Right now it doesn't make a difference at all because I don't have the data but it would give that flexibility. Just a thought.

z3tt commented 4 days ago

It's absolutely possible to merge the categories in the script. It's just important that you stick to the setup we decide on.

Option 1: Aggregated particle classes + values in the database → no post-processing necessary, you could add new particle classes (or split them again etc) whenever you want

Option 2: Raw particle classes + values in the database (as it is currently) → no additional work when adding this data to the database, but post-processing which will be picky on the particle names etc.

If you opt for option 2, how do you aggregate the values? Is it sums or averages? Or something else?

torv72 commented 4 days ago

Go with Option 2. The values are simply summed. No averages or anything else.

torv72 / torv-reports-v4

Include sand fraction table in report #28