Closed joftius closed 3 years ago
Looking at the source of clean_site()
I can see the reason for this now. The site generator$clean()
function includes output_dir
. So anyone who, like me, compiles their site to a (git repo) folder that includes other files which are not part of the blogdown project is in danger of having those files wiped out by this handy little "clean" button.
Hi,
Thanks for the report. That is indeed unfortunate. Can you share you project folders structure so that we understand for sure how to prevent this case ?
My understanding is that cleaning a blogdown website project will remove
public
or it is the config you set in publishDir
field in config.toml
or config.yaml
content/
i.e the files called .html
or .markdown
. The source will be kept in content/
so that the post can be rendered.static
, static/rmarkdown-libs
abd static/<name>_files
Is the issue because you set publishDir
to a directory which contains other things that the hugo website ?
This dir correspond to what Hugo expects:
publishDir (“public”) The directory to where Hugo will write the final static site (the HTML files etc.). blogdown consider that this directory is safe to remove when you want to clean the website, as it is the result of
hugo build
from the source file.
Which of these exactly where wrongly deleted ? I am trying to understand how your organized your project so that we do not misunderstand the issue.
Thanks !
@joftius I'm extremely sorry about what happened! We should have been much more conservative in terms of cleaning sites, since this operation cannot be undone. I feel the desirable behavior of the Clean All
button is to not actually delete the files and folders, but only tell users which files/folders are to be deleted. If users really want to delete them, they must use the command line, e.g., type something like rmarkdown::clean_site(force = TRUE)
(by default, force = FALSE
) in the R console.
I found a backup of most of my files on a different computer so all is not lost for me.
@cderv and @yihui: If I had read "blogdown consider that this directory is safe to remove when you want to clean the website" then I should have known not to structure my folders that way. I don't see how else to host a website using a git repository, e.g. on github sites, where I want to also keep some files that are not all managed through blogdown.
I agree, we should be more conservative. @yihui, the function is rmarkdown::clean_site(preview = FALSE)
as the default (not force
). The IDE is indeed calling rmarkdown::clean_site()
. Maybe it should call preview = TRUE
or we could set the default to TRUE in rmarkdown::clean_site()
@joftius about hosting hugo website, you have several examples in the community. One most use service blogdown works well with is Netlify. With this service, you don't need to build the website, you just publish the source on Github and netlify will build the website using hugo for you. But Github pages works also fine, in a setup where you publish to gh-pages branch or host from a folder within your repo (doc or public)
where I want to also keep some files that are not all managed through blogdown.
I wanted to know more because of this - are you talking of keeping them in the git repo or in the website also ?
This may be some specific usage we need to better support.
@cderv Since I use github pages my website is a github repo. I build locally on my machine and then push the repo to publish changes. So for example my website is hosted at http://joftius.github.io but I also host other content on the same repo like slides I might need to link to when giving a talk, or a syllabus for a course, etc.
Oh I see. You are using a User site - the github repo is the website. You are not pushing the source files to github, but you use blogdown to render in this "website project" directory where live other contents to.
The other way to manage is to have a project for your hugo/blogdown website, keep the source in that project, and build the book into gh-pages, or into the public
or docs
folder that is served using github pages (choosing a source for github pages)
FYi regarding hugo website, when keeping the source of your website on github, I believe those "other files" can live in static/
folder: https://bookdown.org/yihui/blogdown/static-files.html
They will be moved by hugo to the publishDir
when building.
Anyway, I don't think you should change anything to your workflow. I think we should look into the publish directory when cleaning and check if it is a git repo (.git
folder exists). If so we don't remove the publish dir but just warn about it.
That would prevent those problems, and be more supportive of User and Organisation website using Github Pages. We should also default to preview = FALSE
.
I have given some more thoughts about that after discussing with @apreshill
Currently, clicking the Clean All button in the IDE will
rmarkdown::clean_site()
which by default will remove (unlink()
) the files that are in blogdown:::blogdown_site()[["clean"]]
One easy solution:
rmarkdown::clean_site(preview = TRUE)
. This would impact the behavior for all generators. No files would be deleted - they would be only printed. Example:
> rmarkdown::clean_site(preview = TRUE)
[1] "blogdown" "public" "static/rmarkdown-libs"
less chatty than
> bookdown::clean_book()
These files/dirs can probably be removed:
_book/
You can set options(bookdown.clean_book = TRUE) to allow this function to always clean up the book directory for you.
Let's note that in **bookdown**, Clean All button will call `clean_book(clean = FALSE)` **BUT** will remove anyway because that is what `rmarkdown::render_site()` is doing by default.
Another solution:
* Changing the behavior of the button to call directly a function to clean in the site_generator, and that would be the package author to decide to delete or not.
This would require change in the IDE I think so less easy... it is a long term strategy to discuss.
It is easier to make `clean_site()` more chatty as it is in our hands in the **rmarkdown** package
I'd very much like to make rmarkdown::clean_site(preview = TRUE)
the default. Deleting (potentially) a large number of files and dirs is a dangerous operation, especially when it can't be undone.
With preview = TRUE
, we also need a message to tell users what to do if they are sure they want to delete the files (i.e., call rmarkdown::clean_site(preview = FALSE)
in the R console).
Feel free to send a PR to rmarkdown.
For bookdown, the clean
function in it's site generator should only return a vector of paths, and let rmarkdown::clean_site()
decide what to do, instead of having its own clean = FALSE
argument.
Under the Build tab I clicked on More and then Clean All. It proceeded to delete my entire site, including the git repository and even other files that were in the same folder.
I have a folder named "source" that contains all the pre-build files, and I build to a different folder located at ../githubname/ from there. Clean All deleted the githubname folder completely, including a lot of other important files like various course notes that were not part of the bookdown output and were not added to my github repository, so I (apparently) have no way of recovering them now. The working directory, in the console at least, was the source folder, so I don't know why it would have deleted the contents of a different folder.
I may have lost a lot of valuable work permanently and I hope to alert the project maintainers to prevent this happening to others.
I can't give a reprex because I can't recover the state before the issue occurred.