Closed luciazarauz closed 1 year ago
Hi Lucía - you'll need to speak to whoever wrote the "saveRelated" function to fix any issues with it. I have personal functions to validate RDBES format data and create an exchange file in my repo https://github.com/davidcurrie2001/MI_RDBES_ExchangeFiles The file "CreateExchangeFiles.R" shows how I use them with our national data. You're welcome to try them if you want to.
It would be a good idea to have some standard functions for creating exchange files from RDBES format data.
Hi, so I guess I wrote it but I have not tried it for your cases, the last vesion of it is here:
#these fields shoud be removed from the tables for submissions
idFields <- c("DEid", "SDid","TEid", "VSid", "LEid",
"SSid", "SAid", "BVid", "FMid", "OSid",
"FTid", "VDid","SLid", "LOid", "FOid" )
#' Save RDBES Submission CSV
#'
#' @param tbl - the data frame to save
#' @param removeFields - vector of columns to remove
#' @param subFile - path and filename to save to
#' @param append - should the table be overwritten (defaults to append)
#'
#' @return NULL
#' @export
#'
#' @examples
#' #'\dontrun{
#' saveTbl(VD, "./submissions/vd_h8.csv", c("DEid", "SDid","TEid"), F)
#' }
saveTbl <- function(tbl, subFile, removeFields=idFields,
append = T){
wtCols <- setdiff(colnames(tbl), removeFields)
if(all(is.na(tbl[wtCols]))){return()}
write.table(tbl[wtCols], subFile, append = append, sep =",",
quote=F, na ="", row.names = F, col.names = F )
if(!append){
print(paste0(deparse(substitute(tbl)), " overwrote ", subFile, " file!"))
}
}
#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
idFields=idFields){
#if the count is 0 there are now lower tables so it should return
#print(count)
if(length(count) == 0){return()}
if(is.na(count)){return()}
tbl <- tbls[[depth]]
tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
tblAt <- tbl[tbl[id] == count,]
if(depth == length(tbls)){
#it it is the last table just save selected subset and return
saveTbl(tblAt, subFile, idFields)
return()
}
for(i in 1:nrow(tblAt)){
saveTbl(tblAt[i,], subFile, idFields)
saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
}
}
I cont debug it though right now (also prepearing files) :)
Thank you, David and rix, for your quick answers I will try
Ok So my guess that the saveRelated should return without looking for related tables is the field "sampled" is N so I updated the code let me know if it works:
#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
idFields=idFields){
#if the count is 0 there are now lower tables so it should return
#print(count)
if(length(count) == 0){return()}
if(is.na(count)){return()}
tbl <- tbls[[depth]]
tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
tblSampledColname <- paste0(substr(tblID, 1,2),"sampled")
tblAt <- tbl[tbl[id] == count,]
if(depth == length(tbls)){
#it it is the last table just save selected subset and return
saveTbl(tblAt, subFile, idFields)
return()
}
for(i in 1:nrow(tblAt)){
saveTbl(tblAt[i,], subFile, idFields)
hasChildren <- tblAt[i,tblSampledColname] == "Y"
if(hasChildren){
saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
}
}
}
Oh, a fix as the SD table has no SDsampled field:
#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
idFields=RDBESdataConversion:::idFields){
#if the count is 0 there are now lower tables so it should return
#print(count)
if(length(count) == 0){return()}
if(is.na(count)){return()}
tbl <- tbls[[depth]]
tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
tblSampledColname <- paste0(substr(tblID, 1,2),"sampled")
tblAt <- tbl[tbl[id] == count,]
if(depth == length(tbls)){
#it it is the last table just save selected subset and return
saveTbl(tblAt, subFile, idFields)
return()
}
for(i in 1:nrow(tblAt)){
saveTbl(tblAt[i,], subFile, idFields)
hasChildren <- tblAt[i,tblSampledColname] == "Y"
# the SD table does not have sampled field
if(length(hasChildren) == 0) {hasChildren <- TRUE}
if(hasChildren){
saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
}
}
}
thank you rix,
we have tried to run the function and get the following error: Error in if (file == "") file <- stdout() else if (is.character(file)) { : the condition has length > 1
any idea how to solve it?
thank you very much
Error in if (file == "")
I guess this likely means you have not passing the subFile parameters so write.table
does not know where to save the file.
I think there is a difference between the previous and last version of the saveRelated
function arguments (or their order)
my suggestion is to use named parameters in the function call somethin like this:
saveRelated(tbls = myTbls, id = "DEid", depth = 1, count = 1, subFile ="./exportfile.csv")
where: myTbls <- list(DE, SD, ....)
thank you so much,
our next error mesage says "there is no package called ‘RDBESdataConversion’"
do you know where can we find this package?
Sorry I forgot to change this one line (its a private package from where this function is taken from)
#these fields should be removed from the tables for submissions
idFields <- c("DEid", "SDid","TEid", "VSid", "LEid",
"SSid", "SAid", "BVid", "FMid", "OSid",
"FTid", "VDid","SLid", "LOid", "FOid" )
#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
idFields=idFields){
#if the count is 0 there are now lower tables so it should return
#print(count)
if(length(count) == 0){return()}
if(is.na(count)){return()}
tbl <- tbls[[depth]]
tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
tblSampledColname <- paste0(substr(tblID, 1,2),"sampled")
tblAt <- tbl[tbl[id] == count,]
if(depth == length(tbls)){
#it it is the last table just save selected subset and return
saveTbl(tblAt, subFile, idFields)
return()
}
for(i in 1:nrow(tblAt)){
saveTbl(tblAt[i,], subFile, idFields)
hasChildren <- tblAt[i,tblSampledColname] == "Y"
# the SD table does not have sampled field
if(length(hasChildren) == 0) {hasChildren <- TRUE}
if(hasChildren){
saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
}
}
}
Hi rix,
we didn't manage to make it work with our data. But we made our own code. I copy it here just in case it is sueful for anybody
Thanks a lot anyway. As you can see we borrowed some of your lines
idFields <- c("DEid", "SDid", "OSid", "LEid",
"SSid", "SAid", "FMid",
"FTid", "VDid","SLid",
"BVid","FOid","VSid","LOid","TEid")
subFile <- "./3_Results/2021/h5.csv"
if (file.exists(subFile)) file.remove(subFile) # delete previous submisson file first
for (x in 1:nrow(DE)) { nameCols <- setdiff(colnames(DE), idFields)
write.table ( DE[x,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
SDtemp <- SD[SD$DEid == DE[x,"DEid"],]
for (y in 1:nrow(SDtemp)) {
nameCols <- setdiff(colnames(SD), idFields)
write.table ( SDtemp[y,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
OStemp <- OS[OS$SDid == SDtemp[y,"SDid"],]
for (z in 1:nrow(OStemp)) {
nameCols <- setdiff(colnames(OS), idFields)
write.table (OStemp[z,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
LEtemp <- LE[LE$OSid == OStemp[z,"OSid"],]
for (a in 1:nrow(LEtemp)) {
nameCols <- setdiff(colnames(LE), idFields)
write.table (LEtemp[a,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
SStemp <- SS[SS$LEid == LEtemp[a,"LEid"],]
for (b in 1:nrow(SStemp)) {
nameCols <- setdiff(colnames(SS), idFields)
write.table (SStemp[b,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
SAtemp <- SA[SA$SSid == SStemp[b,"SSid"],]
for (c in 1:nrow(SAtemp)) {
nameCols <- setdiff(colnames(SA), idFields)
write.table (SAtemp[c,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
FMtemp <- FM[FM$SAid == SAtemp[c,"SAid"],]
nameCols <- setdiff(colnames(FM), idFields)
write.table (FMtemp[,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
}}}}}}
@luciazarauz Hi Lucia, can you close the issue? Cheers Henrik
Hello,
we are using a function called saveRelated to compile all the tables of echa heirarchy (which in our case is H5 & H3)
For H5 it works perfectly Hovever we have found some problems with H3. It seems to not work well for the out of frame lines (those raised in #52) and also when we have a lower hierarchy "D" with no length distribution, nor biological sampling associated
I have to say that I don't understand well the script of the function (I run it blindly) and that it may not be updated. But, as this is an step that everybody needs to do, it would be very useful to have a solid function we can just run
Thank you very much lucía