unige-geohealth / accessmod

accessmod 5 : anisotropic accessibility analysis.
GNU Lesser General Public License v3.0
42 stars 14 forks source link

Zonal Statistics: Allow automated export of tables to spreadsheets #299

Open anatamambo opened 4 years ago

anatamambo commented 4 years ago

Current Behavior

Currently you have to select all elements in a table or click 'Ctrl'+'A' to select all table cells and rows (Minus headers) and paste it in an excel.

Furthermore from 'Analysis'->'Zonal Statistics', you can only do modelling for a single maximum travel time.

Expected Behavior

Allow modelling to be done for multiple maximum travel times and add an element that automatically exports all data in the table to a spreadsheet. In the case whereby there would be 2 or more maximum travel times, either export them in a single sheet as columns side by side or add each to a new sheet.

Possible Solution

Steps to Reproduce

Detailed Description

SteeveEbener commented 4 years ago

To complement Karae's post:

  1. For the result of the zonal statistics to be automatically saved in a table accessible from the list of available data in the Data module
  2. For the possibility to specify more than 1 travel time and have the resulting table to contain the zonal statistics for each travel time stored in separated columns. Example:
    Example_multiple_TT.xlsx
fxi commented 4 years ago

I've implemented a feature to have multiple travel time. Available in version 5.6.38, in the testing branch.

To avoid a complex interface, with multiple inputs, I kept the same input as before, but if the user separate values by a comma, the process is done on each value.

e.g. 120, 240, 360 -> will produce 3 analysis for each value.

If the user enters a single value, it's exactly as before.

To avoid a lot of different columns, the travel time is in a dedicated column:

cat adm1_name time_m popTotal popTravelTime popCoveredPercent
1 Boucle du Mouhoun 10 71490.2393 2299.3972 3.2164
1 Boucle du Mouhoun 200 71490.2393 56028.0598 78.3716
1 Boucle du Mouhoun 500 71490.2393 70279.1504 98.3059
1 Boucle du Mouhoun 1000 71490.2393 71479.1109 99.9844
1 Boucle du Mouhoun 3044 71490.2393 71490.2393 100
2 Cascades 10 27339.5958 2289.2443 8.3734
2 Cascades 200 27339.5958 19320.159 70.6673
2 Cascades 500 27339.5958 25295.428 92.5231
2 Cascades 1000 27339.5958 27141.6374 99.2759
2 Cascades 3044 27339.5958 27339.5958 100
3 Centre 10 69973.3679 52776.2888 75.4234
3 Centre 200 69973.3679 68801.6544 98.3255
3 Centre 500 69973.3679 69762.2955 99.6984
3 Centre 1000 69973.3679 69973.3679 100
3 Centre 3044 69973.3679 69973.3679 100

For the extraction of the table, I've added a button under all tables to download a file that contains the data.

image

nicolasray commented 4 years ago

That's excellent! However, we absolutely need to have the results of each analysis in a different column. This will greatly simplify showing the results in understandable tables, but also when making a table joint with (usually) the "cat" column and a shapefile of the admin units to make thematic maps of population coverage at a given my travel time. What I suggest is that you name the two columns (even when there is only one max travel time) by appending the travel time, e.g.: popTravelTime_10, popCoveredPercent_10, popTravelTime_200, popCoveredPercent_200, etc. @SteeveEbener : what do you think?

fxi commented 4 years ago

@nicolasray : what you describe is what @SteeveEbener suggested in its spreadsheet file I think.

I've done differently because, from experience, it's better to store data in a long format – that the user can easily cast – than a wide format that is not always easy to melt into a long format. Here, the time variable should be extracted from the column header, which is – from my point of view – harder than the other way around.

Using a long format is a lot easier to work with in a lot of analysis and visualization packages. It's often easy to transpose from long to wide (casting), and often cumbersome to convert from wide to long (melting).

Using the long format keep also the integrity of the table : it's always the same number of columns for one or multiple travel time.

I would recommend keeping the long format in this case.

I can also change this if needed :)

fxi commented 4 years ago

We can also have a button to switch between the two options.

SteeveEbener commented 4 years ago

Hi guys,

Thanks for this great and very useful changes.

Nicolas has been faster than me with his comment :)

I would personally also only use an exported file where the results for each travel time is stored in a different column and each column has an explicit header mentioning the concerned travel time.

This being said, if ever some tools do indeed use the combined format than the button proposed by Fred could be an option.

When it comes to the on screen table, I understand the difficulty to have several columns next to each other but the one by line is really not easy to read => maybe that the multiple time option should only apply to the exported table, meaning that only the first time specified by the user would be appearing on screen.

One question: can the user specify travel time in decreasing (360, 240, 120) or random (120, 360, 240) order? Just want to make sure this would not create a bug if these orders are possible. If only the increasing order is possible then we should maybe adjust the text.

Thanks again

nicolasray commented 4 years ago

So let's have by default the format with multiple columns, and a button to switch to the long format. In my view it is essential to keep all columns, with each indicated time, in the table in the module. Many times there are live discussions about these indicators during workshops, and having the opportunity to look at them directly (even without exporting the table) would be key. To follow up on Steeve last question, it would indeed be ideal that whatever the order of travel times that the user indicates, this is the same order in the output table.

nicolasray commented 4 years ago

All is good except that the important column "popTotal" is missing in the multi-column default format. It needs to be added.

Also, change the new settings text to: Format table to get individual columns for each time step Instead of using a single columns for time and population coverage, individual columns will be produced for each time step. E.g. popTravelTime_50, popCoveredPercent_50, etc.

French: Formater la table pour obtenir des colonnes individuelles pour chaque pas de temps Au lieu de produire une seule colonne pour le temps et une pour la couverture de population, une colonne individuelle sera produite pour chaque pas de temps. Ex. popTravelTime_50, popCoveredPercent_50, etc.

fxi commented 4 years ago

Solved in 7d2c2c1e5 Available in the next release.

I've also struggled with a bug occurring with low travel time. Empty table produced a bug. It's partially solved with a workaround, but the default should be tables with zones, names, id, and pop total, with columns full of 0. It did not produce this and I'm not sure why. The bug does not shows anymore, but I have to make a change to handle empty results

Thanks

SteeveEbener commented 4 years ago

Is the result really meant to be 0 in this case, meaning no population located within the given travel time?

fxi commented 4 years ago

It should be the case. It doesn't make sense ? If the main issue is done, thanks to close this issue. You can open a new issue for further changes.

SteeveEbener commented 4 years ago

Maybe that I misunderstood something. Were you talking about low travel time catchment area within which there is no population? If yes, then the values should indeed be 0.

SteeveEbener commented 4 years ago

One moreTwo more things I just noticed when using the zonal stats for several travel time:

  1. The PopCoveredPercent column for a given travel time is not next to the corresponding popTravelTime one, making it not practical for the user having to go back and forth to see both figures.

=> I would like to suggest that we do have both information next to each other as we get when using only one travel time: image

  1. Does it really make sense to provide a % with 9 digits after the decimal points? I would think that max 3 would be enough especially if we keep 9 digits for the population figure anyway.

Thanks in advance for that.