hsbadr / HiClimR

Hierarchical Climate Regionalization
https://hsbadr.github.io/HiClimR/
GNU General Public License v3.0
15 stars 8 forks source link

HiClimR other error #2

Closed fipoucat closed 4 years ago

fipoucat commented 4 years ago

I am still testing HiClimR and after the matrix creation, I used xGrid2D to create lon and lat column appended to x as follow:

lon <- c(xGrid$lon) lat <- c(xGrid$lat) x2<-cbind(lon,lat,x1) print(x2) lon lat 1961 1962 1963 1964 1965 1966 [1,] -30.25 -5.25 NA NA NA NA NA NA [2,] -30.25 -4.75 NA NA NA NA NA NA [3,] -30.25 -4.25 NA NA NA NA NA NA [4,] -30.25 -3.75 NA NA NA NA NA NA

It looks the column names should not be there? which way to do it? because when I run an example on simple regionalization following the tutorial I got an error: Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) : non-conformable arrays In addition: Warning message: In eval(predvars, data, env) : NAs introduced by coercion

fipoucat commented 4 years ago

Maybe not issue, but more a data handling to fulfill HiClimR data structure, I wonder if possible to attach a sample file?

hsbadr commented 4 years ago

The observations (time dimension) should not include any missing values. Since HiClimR does clustering based on correlation distance, all time steps for a specific location/point should be valid. It removes the rows (locations/points) that has any missing values and that could be all rows if one or more years are missing. You need to remove all columns with missing values manually because otherwise the dissimilarity measure (correlation distance) will represent something else. For example, if you are interested in interannual correlations, it is important to keep valid data every year instead of randomly providing information at different frequency.

Solution: Make sure that you have enough rows (>2) with no missing values or handle missing values before passing the data to HiClimR.

fipoucat commented 4 years ago

Hi Hamada,

I have to update my post and give you more information because the NAs was automatically added by R where there is 0.

On Tue, Dec 10, 2019 at 11:44 PM Hamada S. Badr notifications@github.com wrote:

Closed #2 https://github.com/hsbadr/HiClimR/issues/2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hsbadr/HiClimR/issues/2?email_source=notifications&email_token=AAXY6LLTW7VQFUFC2LTQILLQYASVNA5CNFSM4JSKK3EKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOVM3FFCY#event-2872464011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXY6LPO27LIUZ554UK5BWTQYASVNANCNFSM4JSKK3EA .

fipoucat commented 4 years ago

Sorry Hamada,

I had to update the post because the NAs were added by R where it is zero. I change it but still some problems: file look like this; 1961 1962 1963 1964 1965 1966 [1,] -35.25 -9.75 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [2,] -35.25 -9.25 151.78334045 135.20834351 129.99166870 161.87500000 111.40833282 121.28334045 [3,] -35.25 -8.75 157.24166870 135.23333740 141.16667175 215.67500305 144.05833435 153.30000305 [4,] -35.25 -8.25 161.11666870 129.23333740 139.60833740 189.05000305 129.08334351 168.54167175 [5,] -35.25 -7.75 152.60833740 103.01667023 107.14167023 183.56666565 124.19166565 160.69166565 [6,] -35.25 -7.25 140.84167480 98.59166718 115.30833435 219.06666565 128.42500305 145.00000000 [7,] -35.25 -6.75 132.85000610 87.50833893 113.50833130 203.16667175 121.45833588 124.03333282 [8,] -35.25 -6.25 115.58333588 77.18333435 93.69166565 183.38333130 121.65833282 110.37500000 [9,] -35.25 -5.75 99.72499847 72.81666565 68.29167175 156.61666870 115.56666565 89.47500610 [10,] -35.25 -5.25 80.71666718 61.26666641 52.80000305 130.15834045 95.60833740 65.12500000 [11,] -35.25 -4.75 0

The command i use end up with an error:

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  • continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
  • standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
  • members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
  • validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
  • plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)

PROCESSING STARTED

Checking Multivariate Clustering (MVC)... ---> x is a matrix ---> single-variate clustering: 1 variable Checking data... ---> Checking dimensions... ---> Checking row names... ---> Checking column names... Data filtering... ---> Computing mean for each row... ---> Checking rows with mean bellow meanThresh... ---> 5697 rows found, mean ≤ 10 ---> Computing variance for each row... ---> Checking rows with near-zero-variance... ---> 0 rows found, variance ≤ 0 Data preprocessing... ---> Applying mask... ---> Checking columns with missing values... ---> Removing linear trend... Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) : non-conformable arrays

I extacted a region froma global date is this a problem? because I see you use continent like "Africa". How to it for a region? what you this is still creating the non conformable arrays?

hsbadr commented 4 years ago

What's the size of your matrix? You set the mean threshold to 10, which masks out 5697 rows (try to use meanThresh = 0). Also, check the column names or try to use coarseR (change the steps as you wish, 1 means keeping the original data):

colnames(x) <- NULL xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 1, latStep = 1) lon <- xc$lon lat <- xc$lat x <- xc$x

Finally, disable standardization and detrending: detrend = FALSE, standardize = FALSE.

It seems to me that HiClimR can't find valid rows in the matrix you provided.

fipoucat commented 4 years ago

Sorry Hamada,

I had to update the post because the NAs were added by R where it is zero. I change it but still some problems: file look like this; 1961 1962 1963 1964 1965 1966 [1,] -35.25 -9.75 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [2,] -35.25 -9.25 151.78334045 135.20834351 129.99166870 161.87500000 111.40833282 121.28334045 [3,] -35.25 -8.75 157.24166870 135.23333740 141.16667175 215.67500305 144.05833435 153.30000305 [4,] -35.25 -8.25 161.11666870 129.23333740 139.60833740 189.05000305 129.08334351 168.54167175 [5,] -35.25 -7.75 152.60833740 103.01667023 107.14167023 183.56666565 124.19166565 160.69166565 [6,] -35.25 -7.25 140.84167480 98.59166718 115.30833435 219.06666565 128.42500305 145.00000000 [7,] -35.25 -6.75 132.85000610 87.50833893 113.50833130 203.16667175 121.45833588 124.03333282 [8,] -35.25 -6.25 115.58333588 77.18333435 93.69166565 183.38333130 121.65833282 110.37500000 [9,] -35.25 -5.75 99.72499847 72.81666565 68.29167175 156.61666870 115.56666565 89.47500610 [10,] -35.25 -5.25 80.71666718 61.26666641 52.80000305 130.15834045 95.60833740 65.12500000 [11,] -35.25 -4.75 0

The command i use end up with an error:

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  • continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
  • standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
  • members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
  • validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
  • plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)

PROCESSING STARTED

Checking Multivariate Clustering (MVC)... ---> x is a matrix ---> single-variate clustering: 1 variable Checking data... ---> Checking dimensions... ---> Checking row names... ---> Checking column names... Data filtering... ---> Computing mean for each row... ---> Checking rows with mean bellow meanThresh... ---> 5697 rows found, mean ≤ 10 ---> Computing variance for each row... ---> Checking rows with near-zero-variance... ---> 0 rows found, variance ≤ 0 Data preprocessing... ---> Applying mask... ---> Checking columns with missing values... ---> Removing linear trend... Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) : non-conformable arrays

I extacted a region froma global date is this a problem? because I see you use continent like "Africa". How to it for a region? what you this is still creating the non conformable arrays?

fipoucat commented 4 years ago

Using the setting you gave gone without error and produced a plot. My file have 57 years rainfall data for a window -10 to 25 lat and -30 to -25 lon

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  • continent = "Africa", meanThresh = 0, varThresh = 0, detrend = FALSE,
  • standardize = FALSE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
  • members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
  • validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
  • plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)

PROCESSING STARTED

Checking Multivariate Clustering (MVC)... ---> x is a matrix ---> single-variate clustering: 1 variable Checking data... ---> Checking dimensions... ---> Checking row names... ---> Checking column names... Data filtering... ---> Computing mean for each row... ---> Checking rows with mean bellow meanThresh... ---> 3735 rows found, mean ≤ 0 ---> Computing variance for each row... ---> Checking rows with near-zero-variance... ---> 0 rows found, variance ≤ 0 Data preprocessing... ---> Applying mask... ---> Checking columns with missing values... Agglomerative Hierarchical Clustering... ---> Computing correlation/dissimilarity matrix... ---> Starting clustering process... ---> Constructing dendrogram tree... Calling cluster validation... ---> Computing cluster means... ---> Computing inter-cluster correlations... ---> Computing intra-cluster correlations... ---> Computing summary statistics... Generating region map...

PROCESSING COMPLETED

Running Time: user system elapsed 5.585 0.518 6.109 Time difference of 6.109582 secs Maybe I need to adjust the settings to have more rows considered

hsbadr commented 4 years ago

You should be careful when setting thresholds for data processing. For example, meanThresh will mask out the points the receives rainfall less than the threshold value, which could be all of your data depending on the threshold value and data range/unit. Invalid data with near-zero variance (~constant year to year) will be excluded too.

I'm closing this issue now.