allayur commented 5 years ago

Could you please give a guidnance how to add a printout of the actual (x,y) coordinates of the (central line? ) of all catenas to rstats.txt in addition to id. It is easy to get id of the most representative catenas, but to explore them in the field (we do soil profiling) actual coordinates would be an asset. Thanks!

tpilz commented 5 years ago

There are no explicit (x,y) corrdinates for the catenas. However, you can examine the catenas using the GRASS output of function lump_grass_prep() (i.e. the eha raster map). These are the areas, on which the catenas (the averaged properties of the areas) are based, i.e. for which function area2catena() calculates the properties. The raster IDs of the EHA raster map correspond to the IDs (first column) in the rstats file (argument catena_out).

To get the profile, you can plot columns 2 and 3 (x and z locations) of the rstats file for each catena (similar to the plots generated by prof_class()).

I hope this answers your question somehow?

allayur commented 5 years ago

Thanks! We have already performed those steps with eha map and reclassified eha maps we produced using classified catenas id. If no coordinates available, is it possible to put actual (in meters) not relative x distance to 2nd column of rstats file?

tpilz commented 5 years ago

That would require some changes to prof_class(), which I need to check first. However, you just need to multiply by the resolution of your GRASS location to convert it into meters (note that point 1 for each catena is always at x = 0 m, i.e. you need to calculate (p_no - 1) * resolution, where p_no is the value of the profile point in column 2).

allayur commented 5 years ago

Ok! That helps, thanks!

allayur commented 5 years ago

Sorry, for not packing all questions at once. It would also be useful to know if in reclass_lu.txt catenas are sorted according to proximity to most typical one, e g from most to least "typical"

tpilz commented 5 years ago

The only purpose of this file is to visualise the LUs in GRASS. So, there is nothing sophisticated regarding the order of appearances in this file.

I think it is also not really straightforward to define proximity in this regard. To classify the requested number of LUs from the catenas, in the current form a K means approach is used for clustering the given catena attributes.

allayur commented 5 years ago

I would not guess this sorting either, BUT in lu.dat the "closest catena" is ALWAYS the first for the class in reclass_lu.txt. By name of the variable, I found that class_repr is the most representative one and it prints in "closest catena" in lu.dat. Sorry I insist, but I really need to find this most representative catena on the actual map

tpilz commented 5 years ago

With lu.dat you mean the output luoutfile of function prof_class(), right? Closest catena here can be thought of the most representative catena of a LU. However, in fact the LUs are derived from all the catenas belonging to a LU (i.e. a cluster in prof_class()).

If you want to find this catena, just take the ID of column "closest_catena" and search for that ID in the EHA raster map in GRASS.

allayur commented 5 years ago

Yes, I do so and I found them this way. But it is just a bit suspicious that closest catena always has the smallest (original GRASS eha grid) index among all catenas belonging to final class. dists_class_i <- dists[class_i,i] # retrieve distances of catenas of class i to centroid of class i I am new to R. But where do you get centroid around those lines? I look for "mean" function for example

tpilz commented 5 years ago

all right, now I got it.

I had a look and I think the problem here is the calculation of 'distances' (object dists). I.e. lines 676 ff. in prof_class.R:

      # quick and dirty computation of distance matrix to allow the rest of the script to be run without problems
      dists <- matrix(1, nrow=n_profs, ncol=nclasses) 
      unique_classes <- unique(cidx)
      for (ii in 1:nclasses) {
        # find all profiles belonging to current class
        class_i <- which(cidx==unique_classes[ii])
        # set their distance to the cluster centre to 0
        dists[class_i,ii] <- 0
      }

This sets the distances to zero for all catenas of a specific cluster, i.e. there is no most representative catena, all catenas are equally representative. I don't remember the reason for this anymore. @TillF can you help out here?

tpilz commented 5 years ago

... I think we replaced the former version, which was dists <- daisy(profs_resampled), because it didn't work properly and the step is not really necessary for the rest of the workflow.

@allayur What do you need that for?

TillF commented 5 years ago

Distance metrics are only meaningful with classification mode 'singlerun', i.e. the first number in line 8 of rstats_head.txt must be positive. In contrast, with classification mode 'successive', this metric does not make sense. Dummy values are used as in the code section above. In this case, alayur is correct in assuming that the resulting assignment of the closest catena is not correct, but just a function of its ID. Thus, if you want to obtain the metric, please try classification mode 'singlerun' and parameter plot_silhouette=TRUE. If this doesn't work, please provide your example files (rstats.) and your complete intended call of prof_class() and I will try to have alook at it.

allayur commented 5 years ago

Thanks! When I remove the minus from the header file, I recieve the following error and warning massages % Calculate mean catena for each cluster... Error in value[3L] : Error in dists[class_i, i]: incorrect number of dimensions In addition: Warning message: In doTryCatch(return(expr), name, parentenv, handler) : cf_mode='singlerun' is experimental. Please consider 'successive' by adding a '-' before the first number of line 8 in rstats_head.txt rstats.txt rstats_head.txt

Those are the files. And the switches are all like in the sample R script

TillF commented 5 years ago

Please install the test version with

library(devtools)
install_github("tpilz/lumpR", ref = "closest_catena")

and test it. Any comments appreciated. Thanks for your contribution.

allayur commented 5 years ago

Thank you for the support. After reinstallation calc_subbas finishes with error message % Snap given drainage points to streams... Error in value[3L] : Error in loadNamespace(name): there is no package called ‘sf’

allayur commented 5 years ago

Also, as I understood, call to area2catena is different now, because mask_corr seem depricated. Could you please provide a sample of updated call?

TillF commented 5 years ago

In fact, I have not done any changes but to prof_class(). That means, any new issues are because a) you updated your R version or other packages or b) had not been using the most recent version of R before. Anyway, you don't need to process everything anew, just do the call to prof_class() and let us know if this does the job.

If the other problem persists, please open a new issue with information on the environment you are using (i.e. please paste the output of sessionInfo() ).

TillF commented 5 years ago

concerning your error message: I vaguely remember I once also had the error message you describe. It occurred, when I called a R-GRASS function (or a function involving one) before having initialized the GRASS session. Thus, please try restarting R and running the commands in section "# GRASS ####" before continuing.

tpilz commented 5 years ago

The error Error in loadNamespace(name): there is no package called ‘sf’ is caused by the latest rgrass7 version. It seems they now use simple feature objects of package sf instead of Spatial* objects from package sp by default. One needs to run function use_sp() before using rgrass7 to change that. I pushed a commit to ensure sp is used, as we use Spatial* objects in in lumpR.

TillF commented 5 years ago

Thanks Tobias, I merged this fix into branch closest_catena. allayur, please update again from this test branch as described above:

library(devtools)
install_github("tpilz/lumpR", ref = "closest_catena")

allayur commented 5 years ago

Thank you a lot for support in such a timely manner! I now see the results with the closest catena that we may use in fieldwork.

allayur commented 5 years ago

But it seems to work only for the first attribute in no_LUs , which is ok for now, but generally might be a problem if catenas are very different in length.

TillF commented 5 years ago

Sorry, I don't understand your last post, please explain.

allayur commented 5 years ago

This file works as a header to the output of area2catena. Don't add additional headerlines.

1. line after header: description/field names of the data columns contained in file catena_out

2. line after header: specifies, how many columns of data belong to the respective data-field given in line 1

3. line after header: number of classes/ weighting factors for classification process

4. line after header: factors used for weighting in partition process (column: 1: number of TC to create; 2.: partition method (not yet used); 3.: not used; 4-nn: weighting of supplemental data in TC-partitioning)

id p_no elevation svc slope_width 1 1 1 1 1 4 2 1 1 1
3 0 0 0 0

In my header file I use both shape and x-y extent for the classification scheme. In the resulting file, I have got 4, not 8 representative catenas as in successive version with -4. Also in plots_prof_class I can not see the final plots with representative catenas as previously, so I can not judge wither x-y extent is used or not

TillF commented 5 years ago

If you are using "sucessive", up to 4 x 2 classes will be created. However, the concept "closest_catena" is not defined here 8see post above), so please don't use it. You should use "singlerun", i.e. "-4" in line 8. Consequently, 4 classes and 4 closest catenas should result.

allayur commented 5 years ago

Hm, I learned that +4, not -4 stands for singlerun. Yes, I have got 4 classes , so it is alright I think.

TillF commented 5 years ago

Thanks, yes you are correct about +/-4. I'll close the issue. feel free to reopen if the problem is not solved yet, otherwise, open a new issue.

allayur commented 5 years ago

Sorry... I see the final plots now, it was my mistake. All 4 catenas are very similar. I think it works wrong (((

allayur commented 5 years ago

plots_prof_class.pdf plots_prof_class4cl.pdf

TillF commented 5 years ago

plots_prof_class.pdf seems to have used successive mode, please don't use this in this context. plots_prof_class4cl.pdf, page 3 suggests that classification has mainly been focussed on x-extent, shape practically being ignored. You need to adjust the weighting factors, if you want to put more weight on shape (i.e. column 3, line 8).

tpilz / lumpR

Actual coordinates of most representative catena #52

This file works as a header to the output of area2catena. Don't add additional headerlines.

1. line after header: description/field names of the data columns contained in file catena_out

2. line after header: specifies, how many columns of data belong to the respective data-field given in line 1

3. line after header: number of classes/ weighting factors for classification process

4. line after header: factors used for weighting in partition process (column: 1: number of TC to create; 2.: partition method (not yet used); 3.: not used; 4-nn: weighting of supplemental data in TC-partitioning)