Closed StefanMunnes closed 2 weeks ago
This is how the resulting YAML file looks for the question type demo survey: survey_question.yml
@StefanMunnes first, thanks for putting this together. This is a good first attempt at supporting this feature. That said, I don't quite like that the question structure is being defined by a previously exported yml file. I would want to keep that as being defined strictly by parsing the survey.html file. There are a number of reasons why I would want to keep things this way. First, by defining the structure based on the yaml file, we're opening up the potential for a lot of errors and confusion by users. Someone could accidentally edit something in that file and break their survey, and it would be very hard to identify the source of the problem. In contrast, the survey.html file is much harder to accidentally edit because it's typically opened in a web browser. Second, as we add more features, we want everything to be defined from the qmd and then the rendered html. This is a simpler thing to understand for other developers or anyone else who wants to add other features. Basically, keeping things simple makes it easier to edit. And finally, we're kind of mixing two things here: exporting the question content versus the question structure. The main reason to export questions as I see it is so that the content could be used in another survey. There's really no other reason to export the question structure because it is entirely embedded inside the survey.html file. So we would have two files that are both defining the question structure, which is redundant. Since we need the survey.html file anyway, it is not really an additional performance burden to parse it for the question structure.
So I would suggest revising this and simplifying everything so that you just get an exported yml file containing the questions inside the survey.html file. I would make this very obvious, like exported_questions.yml as opposed to something like survey_questions.yml, which is not as obvious whether it is exported from survey.html or is an imported file into the survey. This will also make it much easier to distinguish files when we have other yml files that could be imported into the survey.
@jhelvy thanks for the feedback. I have also thought about these points, especially about ease of use and avoiding sources of error. Nevertheless, I decided to go for this approach and would like to argue in favor of it again and show some ways to deal with the counter-arguments:
Extracting the question structure from the html file is quite resource and time consuming, especially when the survey gets longer. For this reason, I was inspired by the idea of converting the qmd file to html only when changes are made, and built this approach in parallel to your workflow. Even with this short example, you can feel the time difference when loading the survey page. However, I also kept in mind that users might (accidentally) change the exported question qmd file. That's why I came up with the additional warning in the first line '! READ ONLY - don't change the content of this file' and had a few more ideas:
changing the file name to a clearer, export-oriented name is a no-brainer.
put files like this (or upcoming translations etc) in a separate folder. What about the "survey_files" folder?
you could set a strict time difference between the two files survey.html and exported_question.yml (e.g. 20 seconds). And if the yml file was edited later, simply extract the structure/content of the question again and export it.
What would you say is the difference between the question structure and the question content? I was confused with this in the process and not sure if you really can separate this clearly.
Lastly, I tried to use very well named and documented functions that fit in the config function workflow like rendering the html and, at least in my understanding, don't make things much more complicate.
If you're still not convinced, I can easily transform the pull request to the easy export only version.
@StefanMunnes The primary reason that I would want to do something like what you have proposed is if it could significantly speed up the loading process. But I don't think loading the question data from a yml file will noticeably change the loading speed. The reason is because we still have to parse through the survey.html
file anyway. Whenever the survey loads, the run_config()
function parses through the survey.html
file and stores the page data in a list. This has to be done so that the raw html content is available in the server to display on the pages. The additional step it takes to grab the metadata from the questions (the ids, labels, etc.) is marginally zero. So I don't think we're going to get any performance increase by only saving the metadata to a yml file.
If we're interested in accelerating the loading time, then we could implement something like what you have done here, except instead of saving only the question metadata as a yml file, I would just save the entire pages
list object as a .qs
file. I suggest qs
because it's a very, very fast serializer. This would allow us to store a local copy of all the survey content pre-parsed and organized into an R list structure that could be immediately loaded in the sd_server()
function. If we exported something like pages.qs
to the project root, we could check that it has an older time stamp than the survey.html
file. So long as that's true, then we can use pages.qs
for the internal pages
list, otherwise we will have to parse the survey.html
file for the page content and then update the pages.qs
list. This is basically what you have done, but it would capture everything on the pages, not just the metadata of the questions.
@jhelvy I just did a little local test and timed the time before and after the 'smart' get_question_structure(html_content)
function with three times the variables from the question type demo, which is pretty common for the surveys I usually implement for my team. On my regular consumer hardware (i7-8565U CPU and 16GB RAM), it took me 0.96 seconds to re-extract and save the question content, and just 0.01 seconds to import the yml file on the next run. I'm not sure how much this would change with proper server hardware, but this looks like a huge improvement to me.
I also inspected the pages object you mentioned, wasn't aware of this before, but it ends up storing only a fraction of the information extracted of the question content/structure. In addition, it uses some values, like required questions, that are changed in the app.R
file, and therefore need to be re-created each time the app is started, and can't just be executed on changes to the survey.qmd
file. Also, I can't see where the extracted question_ids from this pages list object are used later? Maybe get rid of them?
And the html content that is stored for each page is just stored in raw format and not parsed in a computationally intensive sense. So I can't imagine how much you would gain by storing this in a separate (qs) file.
The benefit of having this easy to read exported_question.yml
file would be not only to reuse it in later surveys, but also to share it with colleagues and have easy access to the labels for data labeling, codebook creation, or report and presentation writing. I could also show that there is a significant time benefit to using this parsed file instead of extracting this question information each time. And there are easy-to-implement ways to deal with potential user-created errors.
Okay, I finally sat down and looked carefully at this, and you are correct - the get_question_structure()
function is actually the single largest runtime burden inside run_config()
. I hadn't looked at this closely in a while, and I didn't quite realize just how much was going on inside that function. It's a lot of calls to rvest
, which is quite computationally taxing. Here's a screenshot I made using {profvis}:
The get_question_structure()
function is ~2/3 of the run time, with extract_html_pages()
being the second largest function in run time. But as you noted, we actually need to run extract_html_pages()
each time because it takes arguments defined in sd_server()
that the user could change. The question structure is just a static list though that only needs to be updated if the user changes something in the survey.qmd
file.
I'm coming around to this idea now as the runtime benefits are clear. The question now is really about the best approach for storing the yml file. I'm trying to think ahead about potentially supporting another yml file for importing question metadata, so I want to make sure it's obvious which file is which. One idea is to put everything that the user shouldn't modify in a folder, maybe called _survey
since this is a little more obvious that it's an output of running the app? I'm taking inspiration here from Quarto - websites, for example, render to a _site
folder by default. We could put all outputs in there, including the survey.html
and survey_files
folder (if it exists) as well as the exported_questions.yml
file. The only annoying thing here is that Quarto can only render a qmd file into the project directory, so we would need to take an additional step to move the survey.html
and survey_files
folder after they render into the _survey
folder, which would only need to run if the user changed something in the survey.qmd
file. And if all the outputs are in a _survey
folder, then maybe we could just call the yml file questions.yml
?
The biggest downside to this folder approach is it would not be backwards compatible. The current package version looks for a survey.html
file in the root directory, so moving it into a folder would break that.
An alternative approach (that is backwards compatible) is to only put the yml file in a folder. Maybe this could be a questions
folder that could also store yml files to import content? Here I would save the exported file as maybe _questions.yml
or exported_questions.yml
inside this folder. Users could also add other yml files (e.g. demographics.yml
) that the app could import.
It's great to see my impressions also with the help of this great package (profvis) that I didn't know before. The final implementation is a tough decision and I can only give my personal opinion without the burden of being the maintainer of this package.
I prefer the option of having a separate subfolder for all exported files, including the survey.html
file. In this logic, the user knows that everything he provides, whether it is editing the two main files (survey.qmd
and app.R
) or providing additional files (e.g. questions, translations, data, images) must be on the main level. Or one could use the sd_include_folder()
function to explicitly add a folder with additional resources, like the YAML question files. I would argue in favor of this approach, as it should be more compatible with possible future features and their implementation. It's clear where to find package-generated content and where to put all user-generated content. And if some users really do have multiple YAML files to import, they are free to add another folder if they want.
As for backwards compatibility with the survey.html
file in the subfolder, I don't think this is a problem at all. If the survey.html
file is not (up to date) in the _survey
subfolder, it will simply be rendered and then copied to that folder. This is true for all cases when the newest version is used. We could add a line of code to remove any leftover survey.html in the project folder, at least for some transition time.
I agree, a single folder for all exported items just makes more sense, again more in line with the way that other packages / platforms work (e.g. Quarto, RMarkdown). And you're right, I didn't think about the fact that we will end up relocating the survey.html
file into the exported folder anyway, so even for older surveys people have developed it should be easy enough to do the check to copy that file over.
As for the name of the folder, I still like _survey
. I don't like survey_files
because that is the name of the folder that gets generated by Quarto with the default yaml where resources are not embedded. Users can override this like this:
---
format:
html:
embed-resources: true
---
But few will do this, so you'll likely have to search for both a survey.html
file and a survey_files
folder to copy over.
The code to move things should also be quick and easy using the {fs} package. It's absolutely brilliant for this kind of thing:
# Create paths
survey_html <- "survey.html"
survey_files_folder <- "survey_files"
destination_folder <- "_survey"
dest_file_path <- file.path(destination_folder, survey_html)
dest_folder_path <- file.path(destination_folder, survey_files_folder)
# Make the _survey (destination) folder
if (!fs::dir_exists(survey_folder)) { fs::dir_create(survey_folder) }
# Copy survey.html file, then delete original
fs::file_copy(path = survey_html, new_path = dest_file_path, overwrite = TRUE)
fs::file_delete(survey_html)
# Now copy survey_files folder into _survey folder, then delete the original
if (fs::dir_exists(survey_files_folder)) {
fs::dir_copy(survey_files_folder, dest_folder_path)
fs::dir_delete(survey_files_folder)
}
You just want to add this code snipped to the pull request?
And of course, some changes for the questions.yml
output are needed.
I can't look too closely at the moment, have other pressing things to address, but in general I would say I'm on board with this idea. The other thing I forgot and wanted to bring back up was about the other arguments in sd_question()
, like width
, etc. I don't think those need to be in the yml here because those arguments just affect the resulting html, which is still stored in survey.html
which ultimately will be parsed inside run_config()
. The only things needed in the yml are the elements you've already parsed out, so I think it's basically done already. For importing questions though, this might become required. But that's something we can thinking about later as I think supporting importing from a yml file is going to be a lot more challenging to implement.
Okay I just integrated all of this to move everything into a _survey
folder. Can you look at it and test it out? I tried to also make it backwards compatible with anyone who has a survey.html
file in the project root folder. I named the yml file simply questions.yml
since it is now inside the _survey
folder.
Okay, you beat me to it. I'll take a look at it and maybe add some ideas I found while trying to implement it myself.
I also just updated the news file. This was actually a pretty simple update given that you already added most of the logic to use the yml file.
Okay, I have some comments and questions:
# Define and create '_survey' folder for export
folder_target <- "_survey"
if (!fs::dir_exists(folder_target)) { fs::dir_create(folder_target) }
files <- list( main = list( qmd = 'survey.qmd', html = 'survey.html', yml = 'questions.yml' ) )
files$target <- lapply(files$main, function(file) file.path(folder_target, file))
2. I don't think you need to handle backward compatibility explicit. If no `_survey/survey.html' file is found, what is the case for older versions, just render again once and copy the file. Then everybody is on the same track.
3. I was also thinking to put the move survey files code in the `render_qmd()` but decided to put this in the end of the `render_survey()` function, since this is the more general function.
4. You added this 'not self-contained' conditions for rendering the `survey.qmd` file. Is this really necessary? I'm just asking, because it prolongs the code without any benefit for the user and the code?
5. Why would you have the message function inside the `render_qmd()` function and not just before calling this function without an argument? While reading this code, it's hiding what's going on, without a benefit.
6. I had changed every occurrence of file names in the messages/sop functions with the list objects. Also, there is no need of using glue in message, it just extends the code:
```r
message("Extracting question structure from '", paths$target_html, "' and saving to '", paths$question_yml, "' file.")
if (file.exists(files$target$yml) && time_yml - time_html < 20)
Hope you can follow my comments and even like some of them. I could implement them later.
survey.qmd
file needs to stay in the root, but I guess it doesn't matter that your lapply
is creating a "target" path for it since that target won't be used anywhere.survey_files
folder, which will not exist if the user changes the yaml in survey.qmd
to make the html file self contained (and there are two ways to do that, which is why there are these checks). One simplification is to just always check if there is a survey_files
folder only move it if it's there. I was probably overthinking it again. That said, the function to check if it's self-contained is used in sd_ui()
, so we need to keep the function, but you can remove it from being called here.glue()
here. If it's not yet obvious, I code heavily with LLMs for speed, sacrificing best practices sometimes. This entire project happened in like 2 months because of using Claude, otherwise it probably would never have happened. So I lean more towards use the LLM, get a draft, then edit it. So you're doing a good job here of fixing Claude's poor choices :). I'd spend the time myself if I had it, but I don't. questions.yml
file I would want to overwrite it, so make sure the logic aligns accordingly. Also use ()
to make sure the condition logic is solid.After implementing and testing all the changes discussed, I have encountered a very strange problem that I cannot solve by myself.
The slider and date input widgets no longer work, see the screenshot:
This problem occurs even without my recent changes/commit. It seems that some of the JavaScript resources are not loaded correctly. Since the survey_files
folder is connected to the survey.html
file, I also tried to start the app without moving both, same problem.
Additionally there was the problem that every time the HTML was re-rendered I got an error:
Error : [ENOTEMPTY] Failed to remove 'survey_files/libs/bootstrap-datepicker-js-1.9.0/js/locales': Directory not empty
The folder was copied, but could not be deleted with the fs::dir_delete()
function. I therefore commented out this part of the code in my last commit and only moved the questions.yml
file to the survey_files folder. This does not change the visualisation problem, but the deletion error was avoided. But of course this is only one way if the user does not choose the self-contained approach.
Finally, the sd_output()
function no longer works properly either, but I hope this is also related to the main problem. If not, I will try to solve this later.
I was afraid something like this might happen since there are other parts of the package that still might be looking for everything in the project directory. I'll have to look more closely at what is going on, I suspect it has to do with the survey_files
folder being moved. Another thing you could try is to set self-contained: true
in the yaml to see if this persists even after embedding everything in the survey.html
file.
Okay I see a handful of things that are now not correct with the changes you made. Let me fix these.
Okay now the path issues are fixed. I reverted back to my original paths
object as I found the one you created a little more difficult to use, and it created paths to things that were never used (the survey.qmd
file should never move). I also noticed that you had set_folder_files <- function(target = "survey_files") {
...this was setting the target folder to survey_files
, when it should have been _survey
. That was breaking a lot of things. My revised approach is more streamlined then what I had originally and seems pretty robust now on some quick testing.
I'm still seeing the new issues you raised though about the input widgets...going to look into that.
Yes, like I wrote, I changed the target folder in my last attempt to just move the questions.yml file to some kind of a 'process output' folder where files for the survey app are stored. But the main issue didn't change with any of the different attempts of changing or not changing the way or place of storage of the files and folders.
I'm quite sure, that this error just came up with the commit 'moved all exported survey files into "_survey" folder'.
I just created a test branch from my first two commits and everything is working as intended. Unfortunately I don't have time today to check what's going on in detail.
Yes, like I wrote, I changed the target folder in my last attempt to just move the questions.yml file to some kind of a 'process output' folder
Ah that makes sense, sorry I missed this.
I'm 99% sure the reason this is breaking is because the survey.html
file has code in it to load files from the survey_files
folder, but now that everything was moved into a _survey
folder, the paths are incorrect. The html code that is parsed out of the survey.html
file has paths to a survey_files
folder that is now located at _survey/survey_files
, so none of the necessary js, css, etc is being loaded.
One solution is to avoid this altogether and force it to be self-contained. We could modify the survey.qmd
yaml file so that it always outputs a single survey.html
file and no survey_files
folder. I've already got this working.
An alternative is to try and see if we can modify the name / location of the survey_files
folder. I haven't gotten this working yet.
Okay I just pushed one potential solution. It forces the qmd file to be rendered as a self-contained file by directly modifying the yaml header before rendering. It's some hacky machinery to make it work, but it does seem to work. It modifies the yaml header, renders the file as a temp file, then copies the temp file into the _survey
folder. So now there is no more survey_files
folder and everything is embedded in the survey.html
file.
It's a bit messy, but the end result is a pretty clean folder structure where you just have two files in the _survey
folder. If this seems too messy, we could always go back to a different approach where we leave the survey.html
file and survey_files
folder alone in the root directory, and then just make a decision about where we want to put the yml file for the questions. But I do like having all the survey components in one folder, it just looks cleaner to the user.
Ah, great, the hint about the links to the survey_file resources in the HTML file is spot on. I think your suggestion is impressive, but a bit too complicated for the codebase and there are 2 (+1) much simpler solutions:
copy the survey.qmd
file into the _survey
folder and then render it there, then the survey.html
file and the resources in the survey_files
folder are already automatically in the right folder. Then simply delete the copy.
you can simply replace the references in the HTML file after or before copying the survey.html
file and the survey_files
folder:
html <- readLines(paths$root_html, warn = FALSE)
html <- gsub('(src=[\'"])survey_files', glue::glue('\\1{paths$target_files}'), html)
writeLines(html , paths$root_html)
However, you must ensure that the single and double inverted commas are captured correctly, for example in German the survey.html
file includes:
<script src="survey_files/libs/clipboard/clipboard.min.js"></script>
apparently there is the possibility to insert CLI arguments to define the output folder in our case:
quarto::quarto_render("survey.qmd", quarto_args = "--output-dir _survey/")
That would of course be the most elegant, but it doesn't work for me and after a quick search it seems necessary to create a Quarto project:
https://forum.posit.co/t/output-directory-in-quarto-cli-not-respected/143762 https://github.com/quarto-dev/quarto-cli/discussions/2171 https://github.com/quarto-dev/quarto-r/issues/81
One could also try to add the YAML command 'embed-resources:true' to the quarto render
command, but I wasn't able to do that in time.
I would therefore simply suggest discarding the impressive last commit and simply implementing one of the first two solutions instead.
Simple is best - I like option 1. Let me try it.
I think I found the solution here.
quarto::quarto_render(
paths$qmd,
pandoc_args = c("--embed-resources")
)
That's all it took 🤦. All that nonesense I wrote and just adding a simple pandoc arg embeds the resources!
Btw, I just tried strategy 1 and it still fails because the survey_files
folder still ends up being located at _survey/survey_files
, but the html has the paths to the files as survey_files
. But no worries now - just making the html self-contained solves everything.
That's great. It's always amazing how much work you can save with one option.I tried to do this with the quarto option, but didn't get to pandoc directly. Now everything is working again without any problems.
I was still wondering whether this is the best solution for all cases? If a lot of images or other resources are added, doesn't that bloat the HTML file and delay all the extraction commands based on it? I'm not that familiar with it and don't know if that could be a problem. But maybe we can keep an eye on it? Perhaps the pandoc option for defining the resource path would help in this case https://pandoc.org/MANUAL.html#option--resource-path
I also corrected a typo in my last commit, added a comment for the two lines of code that extract page and question IDs, and removed the question_values element from the final config object since it won't be used later or can be extracted from the question_structure element.
Cool. Okay so I was thinking more about this, and I actually think one of my earlier ideas (using {qs} to store the pages
list) might be the ultimate solution. If we complete all the parsing of the survey.html
file and store it in a pages.qs
object, then we no longer need the survey.html
file at all. We could still keep the questions.yml
file simply because it's an easier way to quickly look at all of the questions. The benefit is that if we do this, it will be almost instant loading. We would only need to update the pages.qs
object under the same conditions - if the survey.qmd
file was updated or if the pages
object is missing. We would need one more condition where we update pages.qs
if the user changes one of the arguments in sd_server()
. We can account for this by just storing all those argument values inside the pages.qs
object when it gets created. If we detect that an argument value has changed compared to the stored values in pages.qs
, then we just re-render everything and re-create the pages.qs
object.
I would still put everything in the _survey
folder, and if we wanted we could also keep the survey.html
file there too just so users could open it and preview it if they wanted. But if it's a self-contained html file, then it's purely there for the purposes of previewing because the pre-parsed content in the pages.qs
object will be what we use in the app.
Yes, I think this is a good way to go. I was also thinking about this, and we could also just check for changes in the app.R
file, like with survey.qmd
, this should also do the job.
But should we discuss this in a need discussion or PR and just finish this one?
I think we should just do it here, we're so close. And that's a great idea - just check if the app.R or survey.qmd have been updated, and if so then update the pages.qs file. Did you want to try building this? I think it's just a simple check right at the top of the run_config function. If pages.qs exists and it's older than app.R and survey.qmd, then just return the saved pages.qs file. If not then rebuild it.
The only question now is whether we still need to even make the question yml file something to depend on. We can always export it like I said if you want to more easily see what questions are in the survey, but if we store it in the pages list then we don't really need it. Start up will be super fast anyway.
I add a check to see if one of the two files survey.qmd or app.R is more recent than the pages.qs file. If this is the case, I decided to (re-)render and extract all the content (html, head_content, pages and question structure) and save it in the _suvey folder. Otherwise, just load all the previously rendered, extracted and saved data and continue with the rest of the config code.
I think this is more clear now that there is a clear condition for everything instead of hiding this in all these functions.
I'm not convinced by the qs format. It introduces an extra dependency and the files are so small that I'm not sure it speeds up the process noticeably. I'm also not a fan of having the head_content in a separate file, but I don't know how to avoid that. If we switch to RDS, we could store both objects in one file.
I would vote in favor of keeping the questions.yml file as this was the motivation for my feature request and PR for easy review, sharing and further use of the questions. I would also like to keep the survey.html file for the same reason. It looks ugly, but might be useful for documentation purposes.
Wonderful. I'm okay with this approach of disaggregating the different pieces (head, pages, questions) into different saved files. I also agree on just using rds - the files are so tiny, this won't make any difference in speed, so I removed all the {qs} and now use base rds. I also added the {fs} package to the description since it was missing. I know everything can be done in base R, but I just find the {fs} package much more robust and a pretty safe dependency as it's quite mature now.
I also changed the function name from check_files_updated
to check_files_need_updating
because the name implied the opposite of what it does. The name check_files_updated
seems like a response of TRUE
would mean all the files are indeed up to date, so no need to re-render them. But in fact, it should mean the opposite - the files are NOT up to date and therefore need to be re-rendered. The name check_files_need_updating
seems more consistent with this logic - if TRUE
, then the files need to be updated. Finally, I added one more bit of logic in this function, which is to first check if any of the files in the _survey
folder are missing. If so, then the files need to be re-created.
Finally, I also changed question_yml
to target_questions
just for consistency in naming with the other target files.
With these changes, I think we're pretty close to being ready to merge this. This is a rather significant change though and introduces a lot of new things (the _survey
folder, totally different logic controlling how the survey.qmd
file is rendered, etc.), so I would actually be in favor of making this a significant version bump, from 0.4.0 to 0.5.0 (rather than 0.4.1) and pushing this to CRAN again so more people start using this structure. I usually on push to CRAN on mid-level version changes (0.X.0) and otherwise make smaller bumps (0.0.X) for smaller changes, which I tend to not push to CRAN until we get to another mid-level bump.
That all sounds good to me. I'm happy about the progress.
Okay cool. Actually now I'm giving it more thought and I may just bump this to 0.4.1. It is a bit of a larger change, but if you're going to work on the translations idea then maybe we could wait a little longer to go to 0.5.0 when it has a few more features. I may work on a few more things in the mean time.
Yes, I'm working on this and hopefully can open a PR on the translation today or tomorrow.
One last thing regarding the naming of the functions: Since my English is not the best, please feel free to change every name or message I provide. I have no hard feelings about these things, I'm just interested that the code is well-structured and easy to follow.
Cool. I try to be explicit about whatever changes I make / suggest regardless, mostly for record keeping. Years from now if we want to know why we made some decision, we'll have a record to look at.
I'm going to go ahead and merge this as v0.4.1. Thanks again for the work on this and for continuing to push for your idea. I didn't understand the value at first, but it's really a great contribution. The app loads almost immediately now, which is a huge performance improvement!
Feature: Handle Question Structure with YAML File
This pull request introduces the functionality to export the extracted question structure to a YAML file, as asked in the feature request #132.
Overview of Changes
sd_question()
input. This includes converting matrix subquestions to rows and replacing types from HTML classes to input type options.survey.qmd
file is unchanged, the application will load from the existing YAML file, speeding up the app's startup process.Testing
Important Notes
survey_question.yml
file must remain unchanged, with a comment added to the first line to reinforce this.TODOs
survey.qmd
,survey.html
, andsurvey_question.html
at a higher level, as they are used in multiple functions.sd_question()
function, internal variables, and thequestion_survey.yml
output.Open Questions
survey_question.yml
be moved to a separate subfolder, and is the naming adequate?Your feedback on these questions and the proposed changes would be greatly appreciated. Thank you!