saveRelated function compiling all the RDBES tables

ices-tools-dev / RDBES

The public repository of the RDBES development.

11 stars 5 forks source link

saveRelated function compiling all the RDBES tables #164

Closed luciazarauz closed 1 year ago

luciazarauz commented 2 years ago

Hello,

we are using a function called saveRelated to compile all the tables of echa heirarchy (which in our case is H5 & H3)

For H5 it works perfectly Hovever we have found some problems with H3. It seems to not work well for the out of frame lines (those raised in #52) and also when we have a lower hierarchy "D" with no length distribution, nor biological sampling associated

I have to say that I don't understand well the script of the function (I run it blindly) and that it may not be updated. But, as this is an step that everybody needs to do, it would be very useful to have a solid function we can just run

Thank you very much lucía

davidcurrie2001 commented 2 years ago

Hi Lucía - you'll need to speak to whoever wrote the "saveRelated" function to fix any issues with it. I have personal functions to validate RDBES format data and create an exchange file in my repo https://github.com/davidcurrie2001/MI_RDBES_ExchangeFiles The file "CreateExchangeFiles.R" shows how I use them with our national data. You're welcome to try them if you want to.

It would be a good idea to have some standard functions for creating exchange files from RDBES format data.

rix133 commented 2 years ago

Hi, so I guess I wrote it but I have not tried it for your cases, the last vesion of it is here:

#these fields shoud be removed from the tables for submissions
idFields <- c("DEid", "SDid","TEid", "VSid", "LEid", 
              "SSid", "SAid", "BVid", "FMid", "OSid",
              "FTid", "VDid","SLid", "LOid", "FOid" )

#' Save RDBES Submission CSV
#'
#' @param tbl  - the data frame to save
#' @param removeFields - vector of columns to remove
#' @param subFile - path and filename to save to
#' @param append - should the table be overwritten (defaults to append)
#'
#' @return NULL
#' @export
#' 
#' @examples
#' #'\dontrun{
#' saveTbl(VD, "./submissions/vd_h8.csv", c("DEid", "SDid","TEid"), F)
#' }
saveTbl <- function(tbl, subFile, removeFields=idFields,
                    append = T){
  wtCols <- setdiff(colnames(tbl), removeFields) 
  if(all(is.na(tbl[wtCols]))){return()}
   write.table(tbl[wtCols], subFile, append = append, sep =",",
                quote=F, na ="", row.names = F, col.names = F )
  if(!append){
    print(paste0(deparse(substitute(tbl)), " overwrote ", subFile, " file!"))
  }
}

#' Prepare CSV File for Hierarchy 
#'
#' @param tbls - list of tables in your hierarchy 
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile  - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
                        idFields=idFields){

  #if the count is 0 there are now lower tables so it should return
  #print(count)
  if(length(count) == 0){return()}
  if(is.na(count)){return()}

  tbl <- tbls[[depth]]
  tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
  tblAt <- tbl[tbl[id] == count,]

  if(depth == length(tbls)){

    #it it is the last table just save selected subset and return
    saveTbl(tblAt, subFile, idFields)
    return()
  }

  for(i in 1:nrow(tblAt)){
    saveTbl(tblAt[i,], subFile, idFields)
    saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
  }

}

I cont debug it though right now (also prepearing files) :)

luciazarauz commented 2 years ago

Thank you, David and rix, for your quick answers I will try

rix133 commented 2 years ago

Ok So my guess that the saveRelated should return without looking for related tables is the field "sampled" is N so I updated the code let me know if it works:

#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile  - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
                        idFields=idFields){

  #if the count is 0 there are now lower tables so it should return
  #print(count)
  if(length(count) == 0){return()}
  if(is.na(count)){return()}

  tbl <- tbls[[depth]]
  tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
  tblSampledColname <- paste0(substr(tblID, 1,2),"sampled")
  tblAt <- tbl[tbl[id] == count,]

  if(depth == length(tbls)){

    #it it is the last table just save selected subset and return
    saveTbl(tblAt, subFile, idFields)
    return()
  }

  for(i in 1:nrow(tblAt)){
    saveTbl(tblAt[i,], subFile, idFields)

    hasChildren <- tblAt[i,tblSampledColname] == "Y"
    if(hasChildren){
      saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
    }
  }

}

rix133 commented 2 years ago

Oh, a fix as the SD table has no SDsampled field:

#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile  - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
                        idFields=RDBESdataConversion:::idFields){

  #if the count is 0 there are now lower tables so it should return
  #print(count)
  if(length(count) == 0){return()}
  if(is.na(count)){return()}

  tbl <- tbls[[depth]]
  tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
  tblSampledColname <- paste0(substr(tblID, 1,2),"sampled")
  tblAt <- tbl[tbl[id] == count,]

  if(depth == length(tbls)){

    #it it is the last table just save selected subset and return
    saveTbl(tblAt, subFile, idFields)
    return()
  }

  for(i in 1:nrow(tblAt)){
    saveTbl(tblAt[i,], subFile, idFields)

    hasChildren <- tblAt[i,tblSampledColname] == "Y"
    # the SD table does not have sampled field
    if(length(hasChildren) == 0) {hasChildren <- TRUE}
    if(hasChildren){
      saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
    }
  }

}

luciazarauz commented 2 years ago

thank you rix,

we have tried to run the function and get the following error: Error in if (file == "") file <- stdout() else if (is.character(file)) { : the condition has length > 1

any idea how to solve it?

thank you very much

rix133 commented 2 years ago

Error in if (file == "")

I guess this likely means you have not passing the subFile parameters so write.table does not know where to save the file. I think there is a difference between the previous and last version of the saveRelated function arguments (or their order) my suggestion is to use named parameters in the function call somethin like this: saveRelated(tbls = myTbls, id = "DEid", depth = 1, count = 1, subFile ="./exportfile.csv")

where: myTbls <- list(DE, SD, ....)

luciazarauz commented 2 years ago

thank you so much,

our next error mesage says "there is no package called ‘RDBESdataConversion’"

do you know where can we find this package?

rix133 commented 2 years ago

Sorry I forgot to change this one line (its a private package from where this function is taken from)


#these fields should be removed from the tables for submissions
idFields <- c("DEid", "SDid","TEid", "VSid", "LEid", 
              "SSid", "SAid", "BVid", "FMid", "OSid",
              "FTid", "VDid","SLid", "LOid", "FOid" )

#' Prepare CSV File for Hierarchy
#'
#' @param tbls - list of tables in your hierarchy
#' @param id - string the column holiding the id field for this table
#' @param depth - int the table to start from in the list of tbls
#' @param count - int the praticular id value to use
#' @param subFile  - string filename or path to append
#' @param idFields - vector fields to remove from output
#'
#' @return nothing saves the RDBES submission file to subFile
#' @export
#'
saveRelated <- function(tbls, id, depth, count, subFile,
                        idFields=idFields){

  #if the count is 0 there are now lower tables so it should return
  #print(count)
  if(length(count) == 0){return()}
  if(is.na(count)){return()}

  tbl <- tbls[[depth]]
  tblID <- colnames(tbl)[1] #table first column is expected to always be its ID
  tblSampledColname <- paste0(substr(tblID, 1,2),"sampled")
  tblAt <- tbl[tbl[id] == count,]

  if(depth == length(tbls)){

    #it it is the last table just save selected subset and return
    saveTbl(tblAt, subFile, idFields)
    return()
  }

  for(i in 1:nrow(tblAt)){
    saveTbl(tblAt[i,], subFile, idFields)

    hasChildren <- tblAt[i,tblSampledColname] == "Y"
    # the SD table does not have sampled field
    if(length(hasChildren) == 0) {hasChildren <- TRUE}
    if(hasChildren){
      saveRelated(tbls, tblID, depth + 1, tblAt[i, tblID], subFile, idFields)
    }
  }

}

luciazarauz commented 2 years ago

Hi rix,

we didn't manage to make it work with our data. But we made our own code. I copy it here just in case it is sueful for anybody

Thanks a lot anyway. As you can see we borrowed some of your lines

 idFields <- c("DEid", "SDid", "OSid", "LEid", 
              "SSid", "SAid",  "FMid", 
              "FTid", "VDid","SLid",
              "BVid","FOid","VSid","LOid","TEid")

subFile <- "./3_Results/2021/h5.csv"

if (file.exists(subFile)) file.remove(subFile) # delete previous submisson file first

for (x in 1:nrow(DE)) { nameCols <- setdiff(colnames(DE), idFields)
                        write.table ( DE[x,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")  
                        SDtemp <- SD[SD$DEid == DE[x,"DEid"],]
                        for (y in 1:nrow(SDtemp)) {
                          nameCols <- setdiff(colnames(SD), idFields)
                          write.table ( SDtemp[y,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
                          OStemp <- OS[OS$SDid == SDtemp[y,"SDid"],]
                          for (z in 1:nrow(OStemp)) {
                            nameCols <- setdiff(colnames(OS), idFields)
                            write.table (OStemp[z,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
                            LEtemp <- LE[LE$OSid == OStemp[z,"OSid"],]
                            for (a in 1:nrow(LEtemp)) {
                              nameCols <- setdiff(colnames(LE), idFields)
                              write.table (LEtemp[a,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
                              SStemp <- SS[SS$LEid == LEtemp[a,"LEid"],]
                              for (b in 1:nrow(SStemp)) {
                                nameCols <- setdiff(colnames(SS), idFields)
                                write.table (SStemp[b,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
                                SAtemp <- SA[SA$SSid == SStemp[b,"SSid"],]
                                for (c in 1:nrow(SAtemp)) {
                                  nameCols <- setdiff(colnames(SA), idFields)
                                  write.table (SAtemp[c,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")
                                  FMtemp <- FM[FM$SAid == SAtemp[c,"SAid"],]
                                  nameCols <- setdiff(colnames(FM), idFields)
                                  write.table (FMtemp[,nameCols], subFile, row.names = F, col.names = F, append=T, sep=",", quote=F, na="")

  }}}}}}

HenrikK-N commented 1 year ago

@luciazarauz Hi Lucia, can you close the issue? Cheers Henrik