michaelwiel / hacking_medphys_R_part2

R Version for https://github.com/rvbCMTS/EMP-News.git
https://charmingquark.at/db_R_tutorial.html
1 stars 1 forks source link

Bugs in Linux - ubuntu #3

Closed fgardavaud closed 2 years ago

fgardavaud commented 2 years ago

Hi Michael,

Following your e-mail to gavin and Jonas, I have tested the script in my Ubuntu workstation (20.04 LTS with the latest versions of R and Rstudio), I have some issues on the db_R_tutorial.Rmd file when I knit:

Quitting from lines 626-648 (db_R_tutorial.Rmd)

Error in combine_vars(): ! Faceting variables must have at least one value Backtrace:

  1. rmarkdown::render(...)
  2. knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
  3. knitr:::process_file(text, output)
  4. knitr:::process_group.block(group)
  5. knitr:::call_block(x) ...
    1. layout$setup(data, plot$data, plot$plot_env)
    2. ggplot2 f(..., self = self)
    3. self$facet$compute_layout(data, self$facet_params)
    4. ggplot2 f(...)
    5. ggplot2::combine_vars(data, params$plot_env, vars, drop = params$drop)

If I comment this previous line (line 647) which have some issues, it generate the .html file but obviously the previous graphic is empty - only tittle, ... and also the second graphic is empty too (only the tittle, subtitle and the x-axis name)

I run exactly the same file in my mac (as I download them from Github today) and I have no issue. Very very weird.

Hopefully, no error on the sample_Report.Rmd when knitting.

michaelwiel commented 2 years ago

Unfortunately I couldn't get our old laptops with Linux to run so I can't test it myself for now.

@faceting: This error can occur when there are no values in one of the variables. Did you run the tutorial first so that the database is filled? Maybe you can go into the sample report file and go through it manually to check stepwise if the object that is piped into ggplot has the correct content? Looks specifically at the date-variable...

@automatic installation: I split up the code chunk "packages" into separate parts and pushed everything again. Can you check if that solves the problem or if it helps to narrow it down?

m*

fgardavaud commented 2 years ago

Hi Michael,

I can't run the sample_report.Rmd script on Linux as the db_R_tutorial. Rmd crashes.

No problem on my mac.

I think the problem is, in the db_R_tutorial.Rmd file, in the chunk sqlDataSumstat_query lines 608 - 623 when I run it sumstat is created with 3 parameters (hp10, department, report_year) but without any observation. So that's why the ggplot crashes. When I run on my mac the sumstat dataframe have 208 observations and the 3 parameters.

I have run and analysis all the chunk and I found an error in the tableAddDataDuplicates chunk lines 502 to 517. Indeed, the dbriteTable function have this error :

UNIQUE constraint failed : staffdose.report_uid, staffdose.person_uid, staffdose.dosimeter_placement

I don't know what to do as I don't master SQL database in R.

I add the globalenvironement to help you to debug (I have to zip in order to upload).

Let me know if you want other things myglobalenvironement_on_ubuntu.Rdata.zip .

All the best.

michaelwiel commented 2 years ago

Hi!

When I run on my mac the sumstat dataframe have 208 observations and the 3 parameters.

I get 376 observations so we should definitely look into that.

I have run and analysis all the chunk and I found an error in the tableAddDataDuplicates chunk lines 502 to 517. Indeed, the dbriteTable function have this error : UNIQUE constraint failed : staffdose.report_uid, staffdose.person_uid, staffdose.dosimeter_placement

That's ok, that should happen. (see explanation from lines 529 to ´523 and section "Adding only unique data")

I think the problem is, in the db_R_tutorial.Rmd file, in the chunk sqlDataSumstat_query lines 608 - 623 when I run it sumstat is created with 3 parameters (hp10, department, report_year) but without any observation. So that's why the ggplot crashes.

Either there is a problem with the SQL query or the database does not contain any values at this point. Please try the following steps: 1) delete all variables from the environment 2) run all the code until the chunk "sqlDataSumstat". 3) try dbGetQuery(conn = mp_db_conn, statement = "SELECT * FROM staffdose) in the console This should display the whole content of the DB (612 rows). If it is empty the problem is before that point and there is no data in the DB. If you get an output with 612 rows the problem might be the query.

cheers and good night... :-)

fgardavaud commented 2 years ago

Hi, Thks for this proposition. I come back from extended week-end. I'm currently running out of time. I hope to find some time to test that at the end of this week. Have a good evening.

michaelwiel commented 2 years ago

Hi! Thank you for looking into it and take your time! Have a good week! Michael

fgardavaud commented 2 years ago

Hi,

I finally found some time to find the bug. It was on the Sys.setlocale function as the argument to define the language is specific to the OS platform. You have written this aspect but I was passed through this information ... my mistake !

So, when I put the correct argument everything works. I obtain 376 observations for the sumstat variable.

The correct argument is : Sys.setlocale("LC_TIME", "en_US.UTF-8") instead of Sys.setlocale("LC_TIME", "English") for macOS and Ubuntu (this could be not the case for the others flavors of Linux).

This error was hard to find as on macOS even if I have not the right argument I could generate the whole results (but partially completed). On Linux, it crashed.

I upload for you to verify the results with the output html files in a zip archive : db_R_tutorial_on_mac&Linux_Os.zip.

I quickly compare the two html files and I don't see any difference between them.

Currently, I don't know how to handle the various flavor of OS in the R script. If you have any clue ....

Cheers and have a nice week-end.

michaelwiel commented 2 years ago

Hi! Great that you found the problem!

@ OS-type: I added a code chunk to handle Windows, Mac and Linux (code chunk: "settingLocaleTemp_automatically") and pushed the last version. It works for Windows, please check for the other two. Here is a copy of the code:

# detect OS type
os <- Sys.info()["sysname"]

# set locale according to OS type
if (os == "Windows") {
  temp_loc <- "English"
} else if (os == "Linux" | os == "Darwin") {
  temp_loc == "en_GB.UTF-8"
} else {stop("Could not detect type of operating system")}

Sys.setlocale("LC_TIME", locale = temp_loc)

I also ditched the ggthemes package install/loading lines since I realized that I didn't use it anyway :-) For now I just commented it out, just in case...

Have a great weekend! m*

fgardavaud commented 2 years ago

Hi michael,

I have tested with success your new chunk. it works. Just a minor mistake that I correct and I create a new pull request to implement directly. Have a nice day.

michaelwiel commented 2 years ago

I guess we managed to fix the problem so I will close the issue....