Priority 1 (Scaling up): converting the fixed value of "10" into a new analysis setting parameter

SteeveEbener commented 8 years ago

As discussed with Nicolas this morning it would be good to convert the fixed number of options taken into account when deciding for the site to be located (fixed to 10 right now) into a new parameter the user could define.

This will contribute to addressing the question of equity raised by WHO as the higher the number of options taken into account the higher the chance to select the site for which the catchment population would be the highest over the all area.

We are conscious of the increase in calculation generated by a high value for this parameter but the user will have the choice to decide on this.

Please therefore set the default value for this new parameter to 10.

Thanks

fxi commented 8 years ago

Why do you mean by "options" ? Are you referring to number of input candidates in the coverage analysis ?

The user can now use the maximum number of new facilities to locate. This number is also used internally to define the maximum number of candidates.

E.g. there is 1381 candidates with a suitability of 9999, and the user allow maximum 1000 facilities in output, only the first 1000 candidate's coverage will be analysed. Then, the best, according to the population coverage, will be kept.

If options refers to another concept, please add some details.

Thanks

SteeveEbener commented 8 years ago

I am indeed referring to another concept.

I will try to provide some additional information and maybe that Nicolas can complement.

If I remember correctly, once the process is started for the scaling up, AccessMod calculate the suitability index and use the 10 cells presenting the highest value for this index as the entry point for the next step which is to calculate the population located within the given travel time.

The options I am referring to are these 10 cells.

=> what we would like is for the number of cells in question to be set manually by the user and not to be fixed to 10.

I hope this clarifies, please don't hesitate let me know if this is not the case.

fxi commented 8 years ago

In the code, the selected cells from suitability map are named candidates.

In one of the old version of the scaling up module I wrote, am5 selected candidates as the random n cells of the suitability layer higher than the 99th percentile. n could be an external parameter and this is what you are talking about, I think. Random selection were not wanted, as it remove reproducibility. 99th percentile based selection was not a good idea, as it varies with the resolution of the project. And it does not solve the ex-æquo problem either.

If the computation time was not a problem, then the concept of AccessMod 5 as you know it should not have even existed. I've predicted this in november 2014: accessmod should have been created for cluster, not for home computer with unknown computational power. And the idea of an online interface for launching asynchronous computation in a known environment could have saved us a lot of problem: no installation, environment agnostic, multi users, less bugs, etc.. None of those ideas were wanted and we are stuck with an application that cries for power, on an under powered computer. It will adapt, it will not crash – I hope so – but it's not optimal.

However.

In the recent development, I've found a better solution for handling this issue that seems more scientifically accurate. And it's quite simple. The computational ressources needed will adapt according to the details the user put into the suitability and exclusion rules. The more discriminant rules the user sets, the more efficient computation time will become. This was already the case with the old method, I know. The idea here is to use what we have : the multi-criteria analysis could produce a lot of decimals. Putting this on a broader scale, e.g. 1 to 1e4 or even 1 to 1e6 we can reduce greatly the number of candidates. We extract a reduced number of cells that actually are the best candidates. No 99th percentile nor random selection. If there is one unique candidate according what the user entered, this is it, we evaluate just one site. If there is 120 candidates, it will evaluate them all. Even if there is 1e6 ex æquo, we will use them all. This is the default. But we can add a limitation to avoid this.

What do you think ?

SteeveEbener commented 8 years ago

Selecting candidates randomly is indeed not a solution.

Regarding what you are proposing: While exclusion factors are indeed reducing the number of candidates, the suitability index itself is not generating exclusions but a ranking of the cells from the most to the least suitable one => all the cells in the non-excluded areas should have a suitability value and that could be all the cells if the user does not consider any exclusion rules => AccessMod would have to cover all of these non-excluded cells if we don't put a threshold to the number of candidates => back to issue #122.

That's why I think that the solution here is indeed to:

continue measuring the suitability index for all the non-excluded cells;
allow for the user to indicate the number of candidates he wants to see included in the evaluation.

Writing this I realize that the next issue here is actually the ex æquos...

If I am not wrong, each ex æquo is currently using one place in the suitable ranking. To give an example for a user setting the number of candidates to 5 we should currently have something like this:

rank value 1 100 2 99 3 99 4 98 5 97

Would it be possible for am5 to actually process the ranking as follow, meaning that ex æquos would only use one place in the ranking and therefore all be covered in the evaluation (still using same example where the user set the number of candidates to 5)?:

rank value 1 100 2 99 2 99 3 98 4 97 5 96

This would actually also solve the current bias towards the Northern part of the studied area when there is a lot of duplicates as this is the way the scanning for candidates is taking place and we should keep it that.

fxi commented 8 years ago

There is no "ranking" in the last version as the scale system is already a ranking system in the form of a grid. We use the output suitability raster and work with it. Ex-æquos are substitutable, therefore share the same "rank".

The old version used a sorting step after a vector conversion based on a 99th percentile raster subset. A random process where used to select values among ties, as sometimes we had a very large number of candidates ( 1e6 cells -> 1e4 candidates ).

In the new version we take only the highest suitability, as we honour the rules set by the user.

In your version, you want to bypass the rules and allow broader suitability set of values to be used as candidate. This is correct ?

Example:

T 1-4 means table of case 1 to 4. Suitability value are coded from 0 (not suitable) to 10 (highly suitable). None of those examples represent a real case. If you need a statistical report on a real case please contact me.

Result :

I see two issues :

It will take more time to implement this change.
It could add a serious computational challenge, as we can't predict the distribution skewness.

SteeveEbener commented 8 years ago

Can you just please tell me which version is the one you are referring to as being the old one and which one as the new one?

Thanks

fxi commented 8 years ago

old aka "99th percentile + vectorisation + random n on n bests "; versions of May 20th (e75c96ac371) to November 2 2015 (70ffe0b)
now aka "broad scale + all cells from unique highest suitability + strict rules evaluation "; version of November 6 (19d1f808) to march 2016 (7582ab367e4a)
new aka "what you want" : n bests from multiple suitability rank, partially discarding / softening the rules set by the user

SteeveEbener commented 8 years ago

No need to implement the new aka for the moment as I first have to recheck everything based on the method you have just described as I was not made aware about it until now.

Two question before that:

how do you address case T4 if the user had asked to locate 3 facilities? or more generally speaking: how does AccessMod proceed when the number of facilities to be located is higher than the number of cells presenting the highest suitability value?
What are the "strict rules evaluation"you are referring to?

Thanks in advance for your answer and take care

fxi commented 8 years ago

Sorry, I will be a little more worldly on this one.

Firstly, small detail, aka or A.K.A stands for "Also known as".

The only big change between the old and the current method is: instead of testing a group of less suitable locations, we use a more precise ranking scale and evaluate only the ex-æquo among the best locations only. So only this part is new.

If you want more locations to be evaluated, using a shorter scale could be an option. This is a hidden parameter in config.R, line 465: config$scalingUpRescaleRange = c(0L,10000L). We can use it as an option for the user.

Or you can set your model to be more permissive / fuzzy :

fewer rules
use generic map full of 1
...

Then, a lot of ex-æquo will be processed, even the whole map's cells, as the T3 example shows.

The method you described and the old one are nearly equivalent: we don't totally "trust" the model so we try to apply the capacity analysis on a broader, less suitable group of locations based on a fixed number, e.g. 10.

So, for your questions:

n _facilities > n _{max suitability cells}

The current version will do this

Locate the first facility in a1, as there is no better choice, given the input model that produced this suitability map. No ex-æquo. Again : different models, different results.
The next two iterations need a brand new suitability map if the model said so. If not, great chance that a1 will be selected two more times ! The model is the key.

Strict rules evaluation

Strict rules evaluation refers to the additive model rules evaluation. If you ask the model to give the best candidate using a set of rules, using this candidate only is considered as strict rule evaluation.

E.g. You want to create a new well. The models say:

The best place to locate the new well is at the location that

not a cultivated area AND
minimise the travel time to roads AND
maximise the travel time to every other wells AND
maximise the sum of the population in a given radius AND
minimise the terrain cost AND
maximise the Euclidean distance to geological fracture

In this non-realistic example, the model found 10 locations with a score of 97 on 100. This is the answer to your question. If you wanted another answer, you should have added or removed some rules to open the model results set. If you keep only the terrain cost, maybe the whole map will be evaluated.

The limit between candidate and non-candidate is given by the best overall score only. If you add some additional rules, as "keep the first 10 ranks", it's a little less "strict" as we don't use "only the best(s) candidates" but potentially bad candidates regarding to the rules. What the importance of this ? There is the last tacit rule of discrimination between selected candidate, which is "keep the one with the best population coverage". So we can select a suboptimal candidate regarding the input rules and promote it as the best one if the population coverage is good. But it's still bad regarding to every other rules. And I did not talk about the skewness of the suitability score distribution: we have no idea about it. So take 10 ranks could go very far into the distribution and extract a very bad candidate (see T4).

So "strict rules" refers "do as the model says";
Softening the rules refers to "do as the model says, but add some additional rules to open the result set".

This could be changed, but again, change = more time.

SteeveEbener commented 8 years ago

Thanks for the explanations. I will do the tests and come back to you.

SteeveEbener commented 8 years ago

I just got an email from Nicolas indicating that you are waiting for my feedback on this issue.

I should have thought to put a message here after discovering the issue with the catchment area (issue #108).

I need to have this issue (#108) as well as the other priority 1 issues linked to the Scaling up module (#83, #95, and #109) addressed before doing the check.

Thanks in advance for that.

SteeveEbener commented 8 years ago

There are still two main issues with the scaling up module:

1) You said that the new version was to evaluate all the candidates presenting the highest suitability score but this is not the case. It actually evaluates a number of candidates corresponding to the number of new facilities you want to locate.

To give an example using the Malawi sample data and the generic priority map but the same happens with all the suitability factors (see attached MS Word document for the figures): 1) AccessMod identifies that the study area is composed of 21787 cells (first screenshot) 2) it then identifies 3984 cells presenting the highest suitability value of 10000 (second screenshot) 3) but it only evaluates 7 of them as per the 3rd screenshot (7 is the number of facilities I indicated to be located)

As a result, all the new facilities are located on the first row of the area presenting the highest suitability factor as per the 4th screen shot.

=> can you please modify the code so that the evaluation is done on all the candidates presenting the highest suitability index value (3984 in the above mentioned example) ?

Can you please also confirm that the allocated population is removed from the residual population grid before doing the next iteration?

Illustration_process_scaling_up.docx

2) The "remove unselected row" button for exclusion areas does not work

What I mean is that clicking on this button does indeed remove the item from the exclusion area table (both in AccessMod and the exported table) but the rule is still being applied during the run.

I have done the test with different exclusions areas. Once you have used it once it is applied to any subsequent run even if you remove it from the table using the "remove unselected row" option.

=> you please look into this?

Thanks

fxi commented 8 years ago

Issue on exclusion area reported in #138 from now.

SteeveEbener commented 8 years ago

I guess we can then close this one, right?

unige-geohealth / accessmod